Another optimization question - can I speed up this 32 bit multiply?

What the hell!

I'm still getting that stupid error no matter what I do!

I copied example code from here:
http://www.nongnu.org/avr-libc/user-manual/inline_asm.html

asm volatile("mov __tmp_reg__, %A0" "\n\t"
             "mov %A0, %B0"         "\n\t"
             "mov %B0, __tmp_reg__" "\n\t"
             : "=r" (value)
             : "0" (value)
            );

And I get THE SAME ERROR!