Reading timer to determine clock cycles of a function

Hello,
This should be simple I dont know why its not working!

void setup() {
  Serial.begin(9600);

  cli();

  TCCR1A = 0;
  TCCR1B = 0;
  TCNT1  = 0;//Initialize counter value to 0
  TCCR1B |= (1 << CS10);//Set CS10 bit for no prescaler
  
  sei();
}

void loop() {
  TCNT1 = 0; //Reset
  ikOutput = DIK(ikInput);
  timer1Count = TCNT1;
  Serial.println(timer1Count);
}

But the output is always “1”. It’s a hefty function, and I want to know how many clock cycles it takes to calculate. Any help on what I’m doing wrong?

That code won't even compile. Try again.

function execution time is often so small that measuring one execution is difficult. A better approach it to measure the execution time of 1 million executions of the function.

@jremington
Yes, I cut out a lot as I thought it was clear on what I want to do. I initialize Serial so I can print to the screen. I setup Timer1 with no prescaler. I wanted to clear Timer1, run a function, then read Timer1. That would give me the number of clock cycles it took to execute the function (assuming no overflow). Then print that number to the screen. Whatever that number is, divided by 16MHz would be execution time.

@gcjr
My function DIK(ikInput) uses a lot of math. Square roots and trig. There's no way it could execute in one clock cycle. I'd expect to see some large number!

Does that make sense?

Yes, I cut out a lot as I thought it was clear on what I want to do.

Of course it is clear what you want to do, but you are doing something wrong.

If you want help debugging, post the minimum code required to reproduce the problem. That is a basic rule in all code forums.

@jreminton
Ok, because my functions are large I have simplified. This compiles but I get the same result (prints “1”) when the sin function obviously takes more than 1 clock cycle.

#include <math.h>

class JJJ {
  public:
    float j1;
};

class XXX {
  public:
    float x1;
};

XXX input;
JJJ output;
unsigned int timer1Count;

void setup() {
  Serial.begin(9600);

  cli();

  TCCR1A = 0;
  TCCR1B = 0;
  TCNT1  = 0;//Initialize counter value to 0
  TCCR1B |= (1 << CS10);//Set CS10 bit for no prescaler

  input.x1 = 1.823;
  
  sei();
}

void loop() {
  TCNT1 = 0;
  output = function(input);
  timer1Count = TCNT1;
  Serial.println(timer1Count);
}

JJJ function (XXX num) {
  JJJ result;
  float temp;

  temp = sqrt(num.x1);
  result.j1 = sin(temp);
  
  return result;
}

Which Arduino?

Why not use the built in micros() function for timing?

That way you don't have to worry about protecting 16 bit counter accesses from corruption during interrupts -- which you forgot to do in this example.

i made the following measurements using the code below

the last results, “4 195972 195968”, used micros() and i feel are unreliable. 4 usec at 16MHZ suggests only 64 cycles to do a sqrt() and sin() computation. And from measurements, i think micros is only 16 bits (???)

so i tried measuring the time of execution over a 1000 iterations using millis(). but part of the problem with repeated calculations is compiler optimization which would result in the computation for the same or an unused value simply being ignored. So the value being operated on, input.x1 = 1.823 needed to be modified (*1.01) each iteration and the result used. I summed the result and printed it

because of the above, the time to do the above needed to be measured and subtracted from the results that actually did the desired computation using “function()”.

the “above” code took 17 msec (1000 iterations).
the 3858846.25 is the printed sum

the full computation took 166 msec (1000 iterations).
time to execute “function()” is 149 msec for 1000 iterations, or 149 usec which doesn’t jive with the measurement of 4 usec using micros().

  17       17        0
3858846.25
     166      184       18
3858915.75
      149 x1000      166       17
       4   195972   195968
3858916.75
#include <math.h>

class JJJ {
  public:
    float j1;
};

class XXX {
  public:
    float x1;
};

XXX input;
JJJ output;
unsigned int timer1Count;

void setup() {
  Serial.begin(9600);

#if 0
  cli();

  TCCR1A = 0;
  TCCR1B = 0;
  TCNT1  = 0;//Initialize counter value to 0
  TCCR1B |= (1 << CS10);//Set CS10 bit for no prescaler
#endif

#define N   1000

  unsigned long msec0;
  unsigned long msec1;
  unsigned long dt0;
  unsigned long dt1;
  float    sum = 0;

  // --------------------
  input.x1 = 1.823;

  msec0 = millis();
  for (int i = N; i > 0; i--)  {
    input.x1 *= 1.01;
    sum      += input.x1;
  }
  msec1 = millis();
  dt0    = msec1 - msec0;

  char s [40];
  sprintf (s, "%8lu %8lu %8lu", dt0, msec1, msec0);
  Serial.println(s);
  Serial.println(sum);

  // --------------------
  input.x1 = 1.823;

  msec0 = millis();
  for (int n = 0; n < N; n++)  {
    output    = function(input);
    sum      += output.j1;
    input.x1 *= 1.01;
  }
  msec1 = millis();
  dt1    = msec1 - msec0;

  sprintf (s, "%8lu %8lu %8lu", dt1, msec1, msec0);
  Serial.println(s);
  Serial.println(sum);

  sprintf (s, " %8lu x%d %8lu %8lu", dt1-dt0, N, dt1, dt0);
  Serial.println(s);

  // --------------------
  input.x1 = 1.823;

  unsigned long usec0 = micros();
  output    = function(input);
  sum      += output.j1;
  unsigned long usec1 = micros();
  dt1    = usec1 - usec0;

  sprintf (s, "%8lu %8lu %8lu", dt1, usec1, usec0);
  Serial.println(s);
  Serial.println(sum);

  sei();
}

void loop() {
#if 0
  TCNT1 = 0;
  output = function(input);
  timer1Count = TCNT1;
  Serial.println(timer1Count);
#endif
}

JJJ function (XXX num) {
  JJJ result;
  float temp;

  temp = sqrt(num.x1);
  result.j1 = sin(temp);

  return result;
}

u

I was able to reproduce the “TCNT1 == 1” result with a standard Arduino Pro Mini and a made up function.

I can’t explain exactly why “1” is produced, but it is clearly an artifact of compiler optimization. The compiler may be removing the call to the function, since the result is not used.

Try the following (which incidentally shows how to selectively change the optimization level for particular parts of the program). A few trig function calls and a Serial.print consume > 10,000 clock cycles.

unsigned int count;
float x = 0;

#pragma GCC optimize ("-O0")
#pragma GCC push_options

void setup() {
  Serial.begin(9600);

  cli();

  TCCR1A = 0;
  TCCR1B = 0;
  TIMSK1 = 0;
  TCNT1  = 0;//Initialize counter value to 0
  TCCR1B = 1;//Set CS2:0 for no prescaler
 
  count = TCNT1;
 
  sei();
  Serial.print('<'); Serial.println(count);
  
  x = test(1);
  cli();
  count = TCNT1;
  sei();
  Serial.print('>');
  Serial.println(count);

}

void loop() {
}

float test(int x) {
  float y, z = 0;
  for (int i = 1; i < 4; i++) {
    y = float(x) / i;
    z += sin(y);
  }
  return cos(z);
}

#pragma GCC pop_options

As suggested, it seems like a strange problem with the compiler. While a 0 level optimization might get a number, I do really want to figure out what the calculation time is when on default optimization.

I tried executing the function in an interrupt routine which gave me a result, but it seems all the different ways of doing this give different numbers. What do you guys think would be a good next step to determine if the result is actually correct?

#include <math.h>

volatile float x;
volatile float y;
volatile unsigned int timer1Count;

void setup() {
  Serial.begin(9600);

  cli();

  TCCR1A = 0;
  TCCR1B = 0;
  TCNT1  = 0;//Initialize counter value to 0
  OCR1A = 65535;
  // turn on CTC mode
  TCCR1B |= (1 << WGM12);
  TCCR1B |= (1 << CS10);//Set CS10 bit for no prescaler
  TIMSK1 |= (1 << OCIE1A);
  
  sei();
}

void loop() {
  Serial.print(timer1Count);
  Serial.print(" ");
  
  Serial.println();
  delay(1000);
}
  
ISR(TIMER1_COMPA_vect){  //change the 0 to 1 for timer1 and 2 for timer2
  y = sin(x);
  x = y;
  timer1Count = TCNT1;
}

i thought i showed you above

It is not a "strange problem", the optimization is working exactly as designed.

I showed how to defeat or change the optimization for specific sections of the code (there are dozens of options) so study up and use that capability wisely.

Optimization is unlikely to do much with complex math operations, and please drop the idea that there is a single correct answer to your question.

@gcjr
I will try to implement what you put in and see if it compares to what I got using the interrupt counting version I posted above. At least I can try getting the same number two different ways.

@jremington
I would consider an "artifact of compiler optimization" a strange problem, you might use different verbiage. To prove your theory, could you not add something that uses the result to force the compiler to not ignore it?
I'm a bit surprised you think that I believe there is a single correct answer as I feel we have discussed more than one option at this point.

@gcjr
Doing it your way (millis() function 1000 times) and my way (reading TCNT1 within the interrupt routine) end up with the same result! I'm glad there is at least a good cross check. I'm still curious why the compiler would ignore TCNT1 in the original snippet considering if you do simply this in your main loop

void loop() {
  timer1Count = TCNT1;
  Serial.println(timer1Count);
}

Then it prints timer1Count correctly.

I would consider an "artifact of compiler optimization" a strange problem

You are certainly welcome to that opinion.

I was able to reproduce the "TCNT1 == 1" result with a standard Arduino Pro Mini and a made up function.

I can't explain exactly why "1" is produced, but it is clearly an artifact of compiler optimization. The compiler may be removing the call to the function, since the result is not used.

Using the example from #5, change the definition of JJJ to:

volatile JJJ output;

And you should get reasonable results.