[SOLVED] Type of intermediate but non-stored variables - how does it work?

having not really a problem in terms of coding I rather like to better understand how it works - and where the limitations on a given microcontroller is (here: ATmega 328).

It is about fixed point arithmetics. Consider the following sketch:

void setup() {

void loop() {
  // put your main code here, to run repeatedly:
int  a_max   = 255;
int  b_max   = 10000;
byte a_input = 67;
int  b_output;
static int  cnt = 0;

while (cnt < 1) {
  // Use Case 1
  b_output = a_input * b_max / a_max;         
  Serial.print("b without cast via long:");
  // Use Case 2
  b_output = a_input/a_max * b_max;
  Serial.print("b without cast no long:");
  // Use Case 3
  b_output = (int)((long)((int)a_input * b_max) / (long)a_max);
  Serial.print("b with cast:");


For Use Case 1, the intermediate result "a_input*b_max" is of type long, and after division with a_max it is known that the result will fit into a int-Variable.
Use Case 3 shows, with casts, what from my point of view the compiler does. And, both results are the same (b_output = 57).
So where is the limitation for the limitation for the intermediate but non-stored result (like a_input * b_max)? ATmega328 is a 8-bit processor, and I would expect that hardware registers of the arithmetic logic unit (ALU) has twice the length of the "bittage" of the processor - which is 16-bit then. Obiviously there is same magic going on because the 8-bit processor can manage 32-bit intermediate results... hm, so also long long intermediates? Or even long long long long?

And, for sure, Use Case 2 is a stupid implementation. Doing calculation from left to right (i.e. 67/255) will give intermediate result as zero, and so b_output. Just to show that sequence of execution is relevant.

Hope on some insight - thanks.

P.S.: Why not using floats? Well, fixed point is perfect for my application - and so I stick to effective usage of resources.

I see no casts.

Arithmetic is done using ints, unless otherwise specified

With this, Use Case won't work - but it does. And, (long)a_max is a cast, isn't it?

Yes, it is.

What's your point?

not sure i understand your concern

experienced (tortured) coders have often learned how to force the compiler to do math with certain types. adding a "1.0 *" forces a floating-pt calculation that may result in an int value. "1L" forces a 32-bit (standard C) calculation. And possibly force the multiplications before doing division

the multibyte multiplication routine implemented by the compiler must sufficiently handle the intermediate results necessary to provide an accurate result of the correct size (# of bytes)

when i used fixed-pt DSPs, while it may have been optional, the result may have been saturated on overflow and the code needed to deal with it if needed.

C does not do fixed point math (at least not like i was familiar with using fixed-pt DSPs)

see C Operator Precedence. we've had coding guidelines that required parenthesis to explicitly make clear the desired precendence.

On the ATmega processors such as the 328 the compiler uses 16 bits for the default integer size - unless you explicitly force a wider type this is the size of an intermediate integer result.

The hardware register size is immaterial to the C language, its the compiler settings that determine the programming model you see. An 8-bit processor usually implies 8-bit ALU and 8-bit databus. If ALU size and databus differ, the marketing materials tend to use the larger of the two, sweeping the awkward facts under the carpet (!)

multiplying two 8-bit values requires a 16-bit accumulator. so is it described as an 8 or 16-bit ALU?

8 bit datapaths in the ALU, its an 8-bit ALU.

what's a "datapath"? between registers, between data-bus?

Well, what I tried to ask is something like

int * int = long, i.e. multiplying two 16-bit variables results into a 32-bit variable, then, dividing this by a 16-bit the final result will be 16-bit again.

67*10.000 = 670.000 (which fits into 32-bit), then dividing by 255 results into 2627, which fits into 16-bit. So as long as a_input stays below 255 (which I can ensure, matter of fact assume something 0...255), the result will be always in the range 0..10000.
So, question, do I have to take care for the intermediate result requiring 32-bit? And if so, how to do best?

Remark: I just figured out that my code does stupid things. Use Case 1 and 3 calculate wrong values ... which is a little fun, because in other implementation I made it better ... sorry for misleading first post in which I stated that use case 1 and 3 are working fine.
So, my little sample program doesn't work. Question then: is there a way to get the proper result?

Huh, no? Isn't multiply to integers fixed point math? Ok, one has to give the value of an interger a (physical) meaning, e.g., an int ranging from 0..10000 equals a precentage of 0..100% with resolution 0.01%. And is, say, input signal is a signal strength from 0 to 10dB with resolution 0.039dB it fits into a unsigned byte.
And, please, don't ask if that kind of conversion (dB to %) makes sense. I just wanted to illustrate my understanding of fixed point math.

That isn't true. The result is an int and you may have overflow.

i was taught to use a slide rule in high school chemistry (this is not a math class) and was told to alternate between multiplication and division so that the results is always within some reasonable range.

the same it true in code, making sure the result exists with the size of an integer: so (a*b)/ c rather than (a/c) * b

of course a processor does integer arithmetic. but fixed-pt math operates on non-integer values.

DSPs commonly generate sinusoidal waveforms and use coefficients < 1. they perform integer math and the notion of a binary point indicating the bit representing 1. the binary point for sin value is commonly 14 (Q14), leaving one bit for sign, one integer bit and 14 fractional bits (yea, you could use Q15)

so the code is multiplying two Q14 numbers, resulting in a Q28 value, that now needs to be shifted right 14 bits to return it to Q14. the DSPs i worked on had intrinsic operations supporting Q14 values

the Q values of every DSP math operation was consider to ensure no un-intended overflows of intermediate value. although the databus was 16, the ALU registers were 18-bits and the accumulator may have been 40 bits

so when i simulated fixed-pt DSP code in C, there was additional code to do what some DSP HW took care of.

@dsebastian: Yes, you don't seem to understand what fixed point arithmetic is. Have a quick look at this: https://en.wikipedia.org/wiki/Fixed-point_arithmetic.

Yes, you do have to take care. By default on an 328P ( 8-bit processor ) all intermediate arithmetic is done with 16bit integer as long as none of the operands is long or float.
Your use case 3 should look like
b_output = (long)a_input * b_max / a_max;
to tell the compiler to use 32bit arithmetic. Then the result will be correct.
It doesn't matter which of the two operands is casted to long so this:
b_output = a_input * (long)b_max / a_max;
will work also.

1 Like

Ah, ok, this Q-thing. I have to admit I never used it. When I did controller programming we never used that but normalizing physical values as described above. And, have that said, it looks that I put integer math and fixed point into one bag while it isn't.

Ah, cool, that works. And I guess I have understood, why (I hope). I thought that

b_output = (long)(a_input * b_max) / a_max;

is the same as

b_output = (long)a_input * b_max / a_max;

... but it obivioulsy isnt't. (a_input * b_max) will have - with given numbers - an overflow as both variables are int and so the result. So cast to (long) is coming "too late".


That's exactly how it is. :sunglasses:

Not sure if there is a different view here. On a paper one needs 32-bit to allow any multiplication of a 16-bit.... but, hm, I might deviate from original intention of this thread when I try to explain what I thought and meanwhile learnt in the last few hours... a journey I don't want you (and you don't have to) to follow :wink:

670000 will not fit in an 'int' on most arduinos. When you multiply 'a_input' (67) by 'b_max' (10000) you will cause an integer overflow and get an incorrect answer.

Try type 'long' if you want values much greater than 32000.

Sure, but if you want to do that in C, you need to be explicit about it otherwise you'll get overflow as John warns.

Solution was given by MicroBahner above. So I guess I got it. Thanks anyway.
Marked thread as solved.