until I considered that a*5 could be wider than 16 bits
Um, yes. That would be part of the language definition. You (normally) multiply 16bits by 16bits and get 16bits, with no error handling on overflow. I can't think of a reason for homebrew ASM to try to do better unless that was specifically part of the goal.
It looks to me from the LDI of zero, that the compiler is extending the byte variable to an int
Also required by the language definition.
I would code x = x* 5 as x = x + x <<2; if I wanted to optimize
I would say that that is as much of a mistake as coding assembler. Without evidence to the contrary, you should just trust the compiler to do reasonable optimization of a clear piece of source code: "x = x * 5;"
(that's (i+i)+(i+i)+i ) [what gcc generates.]
In this case, since the AVR has neither a 16byte shift, nor a multibit shift, the best possible implementation using shifts is going to be the same as the implementation using adds. "i+i" is left shift once. "(i+i)+(i+i)" is left shift twice. In fact, on the AVR, the "LSL Rd" (Logical Shift Left) instruction is just an alias for "ADD Rd, Rd"
The nice thing about using a compiler, and the reason that 'large' programs tend to be smaller and faster when written in C rather than ASM, is that this sort of optimization will be applied ALL OVER the program, and not just in the places where you remember to optimize by hand. (actually, the nice part about using the compiler is that you don't have to figure out how to do an 8x16 multiply given an 8x8 multiply instruction (or nothing) in the first place!)