C/C++ rules on AVR: multiply 8x8

For the fastest possible IIR filter with 8 bit quantities, I would like to take advantage of the fact that the AVR instruction set has an 8x8 signed multiply that produces a 16 bit result.

I considered using in-line assembly, but don't know the rules for the C/C++ interface, and can't find anything on line that answers my questions for this particular case.

The code posted below works as hoped, but I don't know enough about the rules of C/C++ to understand why, or if the compiler is optimizing the calculation and fooling me.

My experience with 16 bit multiplies overflowing leads me to ask why the compiler produces the correct 16 bit result in both cases. Comments?

int16_t mul8x8(int8_t x1, int8_t x2)
{
  return (x1 * x2);
}
void setup() {
  Serial.begin(115200);
  int8_t x1 = 127, x2 = -128;
  int16_t r = x1 * x2;  //-16256
  Serial.println(r);
  Serial.println(mul8x8(x1, x2));
}

void loop() {}

It seems like GCC optimizes this correctly: Compiler Explorer

int16_t mul8x8(int8_t x1, int8_t x2) {
    return x1 * x2;
}
mul8x8(signed char, signed char):
        muls r24,r22
        movw r24,r0
        clr __zero_reg__
        ret

Do note that in general, you would have to cast the arguments to int16_t to get a 16-bit result. In this case, it's fine, because of a stupid C remnant that says data types smaller than int are converted to int when used in arithmetic.

2 Likes

Watching the discussion with interest.

How do you know that the compiler does not generate machine code to do this anyway? I'd hope that as it has to generate machine code specific to the target device then it would take advantage of the hardware capabilities, including using an 8x8 hardware multiplier if there is one available.

I don't know the answer but if I were do try to do this I'd just look for the relevant register names and write to them / read from them. Read the data sheet to see what they are called and just include the names in your code as you would any other variable, for example:

Suppose the registers are:

int8_t Multiplicand;
int8_t Multiplier;
int16_t Product;

Then:

Multiplicand = somedata1;
Multiplier = somedata2;
// Maybe wait for the hardware to do its thing
int16_t result = Product;

However, I've not done this, I'm just telling you how I would approach the problem.

I have directly manipulated the registers on a PIC in C many times however, and the above is based on that experience.

Edit @PieterP beat me to a better answer :slight_smile:

Thanks for the quick answers! The Compiler Explorer is a FANTASTIC resource and should be recommended to everyone!

In this case, it's fine, because of a stupid C remnant that says data types smaller than int are converted to int when used in arithmetic.

So this calculation is in fact doing a 16x16 multiply in software?

  int16_t r = x1 * x2;  //-16256

Compiler Explorer says this, which is exactly what I want!

__zero_reg__ = 1
_GLOBAL__sub_I_x1:
        lds r24,x1
        lds r25,x2
        muls r24,r25
        movw r24,r0
        clr __zero_reg__
        sts r+1,r25
        sts r,r24
        ret
r:
        .zero   2
x2:
        .byte   -128
x1:
        .byte   127

Who knows what lies in the heart of gcc?

2 Likes

Well, no, because even though the type is int, the optimizer knows that the value can only be in [-128, 127] because it came out of an int8_t variable, so it can use an 8×8→16 multiplication, it doesn't have to use a 16×16→16 multiplication. But it does affect the result, because decltype(x1 * x2) == int.

Close. The compiler is required to produces a result AS IF the 'char' or 'unsigned char' values were promoted to 'int'. Since the compiler knows that an 8x8 multiply will produce the correct 16-bit result, it does that.

Note: 'unsigned char' is promoted to 'int', not 'unsigned int'. It just isn't sign-extended as part of the promotion.

1 Like

Very helpful answers! Thanks to all of you. Problem solved!

1 Like

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.