The book "Beginners Introduction to the Assembly Language of ATMEL-AVR-Microprocessors" discusses multiplication.
http://www.avr-asm-tutorial.net/
He gives an example of 16 bit x 8 bit multiplication. He says it takes 10 clock cycles. However this is for two variables, not a variable and a constant. So I suppose 10 cycles isn't too bad. But that is more than the 7 cycles shown above for multiplying by a constant 5.
The code generated by C seems to me to be more than 10 cycles, but there is a bit of a crossover from the example on that web page as to what he is counting (in other words, is he counting loading and storing all variables?). In fact glancing at it, it seems to me that the code he is showing takes more than 10 cycles.
FWIW this is what I got from the C compiler:
c = a * b;
d6: 20 91 00 01 lds r18, 0x0100
da: 30 91 01 01 lds r19, 0x0101
de: 80 91 02 01 lds r24, 0x0102
e2: 90 e0 ldi r25, 0x00 ; 0
e4: ac 01 movw r20, r24
e6: 42 9f mul r20, r18
e8: c0 01 movw r24, r0
ea: 43 9f mul r20, r19
ec: 90 0d add r25, r0
ee: 52 9f mul r21, r18
f0: 90 0d add r25, r0
f2: 11 24 eor r1, r1
f4: 90 93 17 01 sts 0x0117, r25
f8: 80 93 16 01 sts 0x0116, r24
LDS, STS and MUL are 2 cycles. LDI, MOVW, ADD and EOR are 1 cycle. So I count 22 cycles there. But again, if you let the compiler do it, it may not need to do some of those loads and stores, if it knows it has the variable in a register already.
It looks to me from the LDI of zero, that the compiler is extending the byte variable to an int, which is probably why the code above is a bit longer than it needs to be.
Test sketch:
volatile int a = 42;
volatile byte b = 16;
volatile int c;
void setup ()
{
Serial.begin (115200);
c = a * b;
Serial.println (c);
}
void loop () {}