 Multiply uint32_t with float and maintain 32-bit accuracy

Ok let me try and explain the problem here. I am using a millis() and % modulo method of creating Ledstrip patterns, but somehow 1 of my patterns started to 'malfunction' after a few weeks of running (the patterns are triggered via a random generator and i noticed it but it took me a few days to actually try and find out what is happening, but i think i got it ! )
Since it is the only pattern, It's not such a big deal, but every time i reset the unit the problem was gone. Anyway i've come to the conclusion that casting a millis() value into a 'float' makes it lose it final few bytes of accuracy , in fact i drop it down from 32 bit to 23 bit (right ? hmm that would mean it would already happen after a few hours, though i may not visibly be able to notice for much later. ) and as long as millis() is smaller than the maximum 23-bit (float value) there is no issue.
Ok, the sketch example, i think i'll simplify as much as possible, in the end i need a uint32_t value again.

uint32_t manixmoment;  // the final value
float manixvalue = ManixValue(val,j);  // the function returns a value between 1.0 & 3.0 approx
// val is a uint8_t , and j is a counter between 0 - 9
manixmoment=(uint32_t) ((float) moment * manixvalue);  // here is my cast
//.....

and ManixValue()

float ManixValue (uint8_t val, uint8_t depth) {  // depth here can be between 0 - 9
return 1.0 + ((float) depth * ((float) val / 127.0)) / 10.0;
}

I hope i have demonstrated the issue, how can i multiply my 32-bit variable with a value between 1.0 & 3.0 ? Just to confirm that the cast is in fact the problem, the issue is also there if ManixValue() returns 1.0

Of course as i am writing this possible solutions are coming to mind, the simplest probably being i could start my calculation with a maximum elapsed time value of 45 minutes... but i am still wondering if there is a better solution.

drop it down from 32 bit to 23 bit (right ?

wow. i'm impressed with your deduction

while a float may be 32-bits it is composed of an exponent and fraction, each of which has a sign allowing it to express very very large or small numbers with limited resolution. see single-precision binary floating-point

i think you're better off using an unsigned long to represent your value

wow. i'm impressed with your deduction

Do i sense some condescending sarcasm here ?

i think you're better off using an unsigned long to represent your value

Yes well obviously, but how do i multiply that with a value between 1.0 & 3.0 ?
Say i do the addition last, i still still need to make all multiplications first before doing any division if i want to maintain accuracy, so for that i'd need an unsigned Double, with all of it's complications then wouldn't i ? I am looking for a way around that.

how do i multiply that with a value between 1.0 & 3.0 ?

Use integer multiplication and division with suitable fractional representations of the multiplier.

x1.5 can be written as (x3)/2, as long as the x*3 does not overflow.

But it eventually will overflow, with millis(), so think of another way to do accomplish your goal.

For flashing LEDs, use modulo (millis()>>2), which won't overflow in that operation, but gives a shorter cycle.

Do i sense some condescending sarcasm here ?

not at all. according to the web page, a float has 23 bits of resolution and you said 23.

but how do i multiply that with a value between 1.0 & 3.0 ?

you can multiply ints and floats. looks like a cast may be needed.

long  x = 1000;
printf (" %d\n", (long) (x * .3));

gives me 299.

since you said >=1, i'm not sure if you need a fractional component. But if you're not sure, why not use double which appears to have 52 bits of precision.

x1.5 can be written as (x3)/2, as long as the x*3 does not overflow.

Yes i was thinking in that direction, and somehow the fractional part of the equation isdepth * ( val / 127.0) / 10.0;
which i think comes down to depth *  val / 1270; and as long as i do multiply before dividing i won't lose accuracy, but since i have to multiply this with millis() i start out with a 32-bit value and i am going to overflow, and with val being a full 8-bit uint (it ranges from 0-255) i will need at least 13-bits in reserve.

For flashing LEDs, use modulo (millis()>>2),

That is actually not a bad thought, since i don't really see the difference if i just do all math with 2 bits less, though for the end result it should be fine if required, for the equation it is not going to cut it.
Right now i am thinking about calculating moment as an elapsed time since a zero-point (which i anyway do for all other patterns) and prevent it from from using more than 17-bytes, and simply re-calculate the zero-point when it threatens to do so. Normally i do this when the speed of a pattern changes, but since in this pattern there is another variable that influences pattern length i had decided not to bother for this pattern (sloppy, but it was good enough)

But if you're not sure, why not use double which appears to have 52 bits of precision.

Well in this program, memory and in this case flash memory is a scarce resource, per pattern i try and stay around 500 - 600 bytes and though i am not exactly sure how much double takes it seems wasteful to include these functions used, into the program (maybe i'm wrong here, but i do suspect it will cost me.)

not at all. according to the web page, a float has 23 bits of resolution and you said 23.

well yeah i realized that the sign and the exponent take space and googled for the amount that was left over. I had deduced that that was the problem, i first thought it was overflow, but since the program had only been running for 36 days it might be it, until i realized that if depth == 0 then manixvalue = 1 so the jumping of the pattern must somehow be caused by the cast to float. Looking at it like that it isn't all that impressive, hence my misinterpretation. But to get back to the bit-reduction of the millis() value, well 9-bits is to much already (clearly) so trying 13-bits is out of the question. Anyway thank you guys for thinking with me here, looking at the issue at hand checking out avenues of approach always helps.

I'm inclined to look at the problem from the other end
Why is this producing a float

float manixvalue = ManixValue(val,j);  // the function returns a value between 1.0 & 3.0 approx

Why not get it to produce an unsigned long?

Or maybe another way of expressing my thought is "what level precision do you need in the variable manixvalue?

Separately, any multiplication of any data type (including an unsigned long) risks losing precision if the result overflows.

Another thing you might consider is how many millisecs do you need before your system repeats (or concludes). I suspect it does not need the 49 days that millis() is capable of.

...R

Why not get it to produce an unsigned long?

Well yes, but i it is a multiplier for millis(), which is a uint32_t (unsigned long) as well, where the based multiplying value is 1, and ranges up to 3, so it overflows after 16 to 50 days or so, that is fine, but i lose accuracy if i use anything less than a 13-bit value, that means i may as well use a float since it is 4 bits more accurate (in the end) So to answer your question, when the depth value (which is actually the time in the future where the so-many-th pixel would be, that is a poor explanation, t is a pretty cool pattern that is simply a few cylons that moving at different speeds (but at constant separation, they start out together, then they separate, if let's say there are 4 then half way through 2 of them will pair up and in the end they all come back together again.) is 1, this would be the second pixel, if that completes a full cycle more than the primary pixel, then the whole pattern is completed (the other pixels would have completed 2,3,4 etc cycles completed compared to the primary pixel. I suppose that 25ms/step is acceptable, but since the framerate itself is undefined (though when receiving DMX 25ms/frame is to be expected, this may cause flutter, a pixel not moving fluently already. The truth is i'm not really sure, but anything that doesn't move, when it should for about 50ms (20hz) becomes visible in my experience.

how many millisecs do you need before your system repeats

well this is why i dropped it in the first place, outside of the number of leds in the strip (-1 * 2), the speed, there is also the separation of the leds, but yes i figure actually 16-bits should be enough. So now i got to do the proper calculation of that, and then recalculate the 'zero-point' every time either of those factors change, and if they don't for more then 20-bits of ms (i have 23-bits or 19-bits lossless, maybe even 16-bits is enough that means it recalculates nearly every minute but that is fine, time is not scarce. If the calculation takes a bit of time that temporarily influences the framerate, but the math is not affected by that due to the millis() & modulo method. So all in all i think i got i covered, just writing the algorithm, with a very clear head. and wonder what i do when separation is 0 and the patternlength becomes infinite.

Ah no silly me, the pattern can never be infinite, my separation value should have a minimum of 1 (not zero) just like speed actually does (lowest speed is 1BPM i ran into the same problem there) , now if i would make value an 8-bit signed value instead, i probably need about 23-bits to make sure i always can store the complete length of the pattern, and that i already have. 16-bit is not going to cut it though, but i think i will leave the current part of the pattern function as it is, and just add a recalculation function which is also triggered if a complete cycle has elapsed.

i'm sorry, but i don't follow what you're trying to do; what you need? if you have the time, can you explain? now i'm curious

do you need a value with a limited number of bytes/bits that represents something (what?) with some minimal number of integer and fractional bits?

Perhaps you can take advantage of the 64-bit 'long long' type. Make sure all 23 binary places in your float are to the left of the decimal point by multiplying by 224: 0x01000000. Then truncate to a 64-bit integer. and divide the result by 224 to get a 32-bit result.

uint32_t multiply(uint32_t multiplicand, float multiplier)
{
const uint32_t TwoToThe24th = 0x01000000;
uint64_t intMultiplier = multiplier * TwoToThe24th;

intMultiplier *= multiplicand;

uint64_t result = intMultiplier / TwoToThe24th;
return result;  // Truncate to 32 bits
}

void setup() {
uint32_t myUnsignedLong = 721364981UL;
Serial.begin(115200);
Serial.println(myUnsignedLong);
Serial.println(multiply(myUnsignedLong, 1.41421356));
}
void loop() {}