For the fastest possible IIR filter with 8 bit quantities, I would like to take advantage of the fact that the AVR instruction set has an 8x8 signed multiply that produces a 16 bit result.
I considered using in-line assembly, but don't know the rules for the C/C++ interface, and can't find anything on line that answers my questions for this particular case.
The code posted below works as hoped, but I don't know enough about the rules of C/C++ to understand why, or if the compiler is optimizing the calculation and fooling me.
My experience with 16 bit multiplies overflowing leads me to ask why the compiler produces the correct 16 bit result in both cases. Comments?
mul8x8(signed char, signed char):
muls r24,r22
movw r24,r0
clr __zero_reg__
ret
Do note that in general, you would have to cast the arguments to int16_t to get a 16-bit result. In this case, it's fine, because of a stupid C remnant that says data types smaller than int are converted to int when used in arithmetic.
How do you know that the compiler does not generate machine code to do this anyway? I'd hope that as it has to generate machine code specific to the target device then it would take advantage of the hardware capabilities, including using an 8x8 hardware multiplier if there is one available.
I don't know the answer but if I were do try to do this I'd just look for the relevant register names and write to them / read from them. Read the data sheet to see what they are called and just include the names in your code as you would any other variable, for example:
Well, no, because even though the type is int, the optimizer knows that the value can only be in [-128, 127] because it came out of an int8_t variable, so it can use an 8×8→16 multiplication, it doesn't have to use a 16×16→16 multiplication. But it does affect the result, because decltype(x1 * x2) == int.
Close. The compiler is required to produces a result AS IF the 'char' or 'unsigned char' values were promoted to 'int'. Since the compiler knows that an 8x8 multiply will produce the correct 16-bit result, it does that.
Note: 'unsigned char' is promoted to 'int', not 'unsigned int'. It just isn't sign-extended as part of the promotion.