#define versus const and other software performance issues

Your method would be faster if you hadn't made the intermediates n2, n4, n32, n64 and n102 unnecessarily volatile. You could make it slightly faster still with the chain 2,3,6,96,102.

The reason the multiply method is fastest is because the ATmega has a 2 cycle 8-bit multiply instruction built in. You really want the shift to be 8 or 16 bits as then the shifting can be done for free (the compiler just moves the bytes in the long down by 1 or 2 places). Try (n*6554)>>16