Another optimization question - can I speed up this 32 bit multiply?

scswift:
Hey guys,

I want to add fine volume control to the WaveHC lib and I want to optimize it as much as possible.

This is what the compiler spat out when I did what I thought would become a few multiply and shift instructions:

	// dh, dl -> 12 bit tmp:
uint32_t tmp = (dh << 4) | (dl >> 4);

Those multiple-bit shifts are very bad for optimization. They get done in a loop, one bit at a time.

scswift:
Right now, I'm converting dh and dl to a 12 bit value since that is what the dac can handle, then I multiply that by volume, and divide by 1024

Maybe better to convert dh and dl to a 16 bit value, avoiding the shifts.

You're doing the right thing though - look at the disassembly...see what the compiler is up to.

Last time I optimized something like this I ended up creating a special struct:

union ByteInt16 {
  int val;
  struct {
    // Access to the bytes of 'val'
    byte lo, hi;
  } bytes;
  ByteInt16& operator=(int n) { val = n;  }
  operator int() const { return val;      }
};

That way I can address individual bytes of the integers directly and avoid shifting by 8 - the compiler can be really stupid sometimes. If you keep the useful parts of the numbers aligned on 8-bit boundaries you can make a massive difference to the code.

(I also made a similar struct for 32 bit values...)