Fast Modulo help needed

hi,

can anyone help me to improve this function in order to be faster? right now it takes about 300micro sec
to execute and it it used every second.
I've found this page Code Review: Division of Integers by Constants
but i don't know how to port it on 32bit unsigned integer
In the code below the modulo is eating a lot of time

unsigned long M2(unsigned long z)
{
  // multiplies UT hours expressed in seconds (0<=z<=86399) 
  // by 1.00273790935 without using float data type
//  const unsigned long b=1002UL;
//  const unsigned long c=7379UL;
//  const unsigned long d=935UL;
//  const unsigned long base=10000UL;
  unsigned long dz=935UL*z;
  unsigned long cz=7379UL*z;
  unsigned long bz=1002UL*z;
  unsigned long dz2=dz/10000UL;
  cz=cz+dz2;
  unsigned long cz2=cz/10000UL;
  bz=bz+cz2;
  unsigned long bz1=bz%10000UL;
  unsigned long bz2=bz/10000UL;
  unsigned long n=((bz2*100+bz1/100)+5)/10;
  n=n-86400UL*(n/86400UL);
  return n; 
}

thank you

If you're only calling it once a second and it's only taking 300 microseconds, is it really performance critical? It's only accounting for about 0.03% of the CPU load by those figures.

i have also other functions on the run i want to save speed and memory

I was simulating results of the function in LibreOffice Calc, and what I discovered, that function is doing : Output = Input x coefficient (1.00273790935) - Input.

With Input 86398 , Outputs = 235. This result was received using rounddown and mod sub-function,
That is pretty close, same time not exact, correct result: 236.5498920213

My suggestion, instead of multistage calculation do simple multiply - shift, which is basically always work as substitute for floating point math. First step would be rearrange formula:
Output = Input x coefficient (1.00273790935) - Input
to Output = Input x coefficient (0.00273790935).
Next, get value 0.00273790935 as long integer, that is easy, just multiply 2^32 :
0.00273790935 x 2^32 = 11759231
Here you are, one line code: Output = ( 11759231UL * Input ) >> 32.
Simulation: 86398 * 11759231 >> 32 = 236, that is better than original math function.

*Not tested, could be casting issue.
Eddited:

There is a mistake in original code comments, valid values for z is not 0<=z<=86 399, but
0<=z<=86 163, so my simulation with 86398 isn't correct.
Plus, after thinking on 32-bit overflow prevention, I came up with another "scaled" down formula with coefficient 0.00273790935 x 2^22 = 11 484
Output = Input + ((Input x 11 484) >> 22).
Check in LibreOffice:

Input Output dz cz bz dz2 cz2 bz1 bz2 n result
86163 86398.906483324 80562405 635796777 86335326 8056 63580 8906 8639 86399 86399

Approximation:
Output = 86163UL + ((86 163UL x 11 484) >> 22) = 86 398

Well, it's "off" with error = 1 ( strictly, error = 0.90......) . Same time it's fastest. To get more accurate results, little bit slower, we need to do "round":

Output = 86163UL + ((86163UL x 11 484 + 2097152UL) >> 22) = 86 399 ,

where 2 097 152 is 2^22 / 2, or "0.5" in float arithmetics.

Finally: Output = Input + ((Input x 11 484 + 2097152) >> 22).

gvi70000:
i have also other functions on the run i want to save speed and memory

So how about addressing the 99.97% of the CPU load that is not caused by this code fragment? Fixing a flawed algorithm will yield far greater benefits than optimising small parts of your code - especially if you're focusing on parts that only contribute to 0.03% of the load. Suppose you manage to reduce that to 0.02% - has that solved your performance problem?

Thank you Magician, now i understood how it works.
The input parameter z is the number of seconds in a day and it will be between
0 and 86399 and the output of the function will be maximum 86635, actually
866355 because i need it in tens of second

Now i will try to make something similar for n%86400.
it is possible to get in few steps using bit shift also the integer part?