Another optimization question - can I speed up this 32 bit multiply?

fungus:
If the chip has a hardware multiplier which takes two clock cycles...why doesn't the compiler use it to do shift operations instead of creating a loop of single-bit shifts?

I don't know. Possibly we don't have the latest version of the compiler. Possibly that particular optimization was omitted from the code generation section.