Many instructions, including multiplying, are single cycle. Separate data and instruction buses (Harvard architecture) allow simultaneous data and instruction accesses to be performed. Also, up to two instructions can be fetched in one cycle (they share the same memory space). Thanks to the Thumb-2 instruction set feature, there is no need to switch between 32 and 16 instructions that can be used together in one operation state (no state switching overhead). In other words, saving both execution time and instruction space give the Cortex-M3 processor higher performance efficiency.
Did you ever get your answer? Seems like if different instructions take different amounts of cycles, there should be a chart of instructions vs cycles needed... I'm new to Arduino, and I have the same question.I want to make short pulses using some Arduino thingy and need to know how fast I can turn on/off a port to do that. Guess it might be quicker to just write the code and measure it.Will let you know when I do the work.