fastest Q0.7 fixed-point-multiplication (preferably without using assembly)...?

Hi,

I sometimes need fixed-point multiplications like this (µC is At328 and At32u4):

int8_t a = +127;
int8_t b = -127;
int8_t c = (int8_t)( ( (int16_t)a * b )>>7); // just save the high-byte

OK, there is some rounding error (I don't care about that) but it is quite fast... But maybe there is a faster way of multiplying two signed bytes which each other and just keep the high-byte from the result?

As far as I can see, both µCs allow for the FMULS-instruction. Does the avr-g++ use it for At328/At32u4 in the above case? (I still have to find out how to activate output of the assembly-files with the Arduino-IDE... sic)
If so, nice... if not... well are there any "weird" things (AT&T-Syntax, anyone?) to obey, when trying inline-assembly for that? Maybe there even are some compiler-intrinsics?

best,
L

The compiler doesn't seem to be using FMULS: Compiler Explorer

As far as I can tell, there are no compiler intrinsics for it, but the inline assembly is simple enough:

int8_t muls07_asm(int8_t a, int8_t b) {
  int8_t c;
  asm ("fmuls %1, %2" "\n\t"
       "mov %0, r1" "\n\t"
       "clr __zero_reg__" "\n\t"
       : "=r"(c)
       : "a"(a), "a"(b)
       : "r0", "r1");
  return c;
}

Remember to restore __zero_reg__ (r1) when you're done, it's overwritten by FMULS. The arguments of FMULS should be in r16-r23 (“simple upper registers”), hence the "a" for the arguments.

The downside of the inline assembly is that the compiler can't optimize as heavily. For example, the clr __zero_reg__ seems to be duplicated when calling the function multiple times, even if that's not strictly necessary, and the compiler doesn't evaluate these multiplications at compile time (it does do this for normal C++ multiplications and shifts if the arguments are compile time constants).

Pieter

This topic was automatically closed 120 days after the last reply. New replies are no longer allowed.