However, an ARM uc inserts its own wait states here and there between instructions
Heh. the ARM architecture also permits noops to be deleted from the pipeline without actually being executed, so it's also possible that a NOOP would take LESS than a single cycle. I don't think any for the Cortex-M processors actually DO that, but ... it is allowed. It is very annoying to try to write cycle-deterministic code on most ARM chips ![]()
On the bright side, there's the systick timer, which counts at the cpu frequency, that you can probably use for pretty accurate microsecond-level timing, if you add a bit of complexity to handle the possibility of timer-reload, and the reload value is "large" compared to the number of ticks you want to delay.