Guess what? Apparently the atmega328 used in most Arduinos does have hardware multiplication support!
● Advanced RISC architecture
● 131 powerful instructions – most single clock cycle execution
● 32 8 general purpose working registers
● Fully static operation
● Up to 16MIPS throughput at 16MHz
● On-chip 2-cycle multiplier
But as I suspected, it takes 2 cycles, where subtraction will only take one cycle.