Counting processor cycles or a mathematical computation on ATmega 2560

Hello Everyone!

I have a basic question. I'm aware of the existence of millis() and micros() for getting the time stamp counter at a particular instant.
My requirement is something like this :
I need to compute some basic computations like say additions, multiplications on small numbers, and I want to measure how many processor cycles have been used up to do that particular computation on ATmega 2560 ?

So I think I can use millis or micros before the computation, and again after the computation, take their respective difference to know the time in seconds that have been spent on that computation.
And I presume the Arduino Mega boards runs at 16MHz frequency? That implies 16000000 processor cycles per second, is it?
So can I just compute the number of processor cycles from these two things?

Please let me know. Thanks for your patience!

But each instruction might have required different number of processor cycles, just counting number of instructions wont give me exact number of processor cycles taken isnt it?

Thanks.I see your point. I am sure this will work.
But what if I have a very big code, then getting the processor cycle count from asm file might be tedious, so in that case the way I mentioned in this post, will that work ? Or is there any other easier way ?

You need to bear in mind that the number of processor cycles executed is not the same as the number of instructions executed, unless all the instructions are single cycle.
Bear in mind also that some higher level arithmetic operations implemented as multiple instructions may take different numbers of cycles depending on the operands.

You can always get an estimate by timing two loops and subtract the result.
in pseudo code

start = micros();
for i =0 to 1000000
doBIgMath
duration1 = millis() - start

start = micros();
for i =0 to 1000000
doBIgMath
doBIgMath
duration2 = millis() - start

print (duration2 - duration1) / 1000000 * 16; // 16 instructions per micro

Doing the loop twice and subtracting the two values eliminates the loop overhead.

Be aware that the compiler can optimize loops and formulas of which the result is not used.
BY assigning the result to a variable of the type volatile int or volatile float this optimization
is surpressed.

Thank you so much Sir. I understand now.

One thing (sorry, one of the things) you need to be careful of when trying to benchmark in this way is that the compiler is very, very clever, and may optimise results in conditions where the result is calculable at compile time, and your runtime results with real variables may be very, very different.

If your code identifies the number microseconds between two stages then if you assume 16 instruction per microsecond you will have a reasonably good indication of the number of cycles.

If you repeat the calculation 1000 or 10000 times during the time interval you will eliminate almost all of the uncertainty.

Having said that I can't see how the number of cycles can matter - and maybe if you explained that we could give more useful advice.

...R