Another optimization question - can I speed up this 32 bit multiply?

I guess r1 is used as a "zero register" by GCC. I'm not sure why, I guess it's for speed, but that might explain why the code resets it at the end.