Math Error

I'm working on a project and I'm trying to get some timing benchmarks on math operations on the Arduino. We're looking at doing some numerical methods and have to make sure we have enough time to do them. However I ran into this very strange problem, when the increment operator on num inside the loop is commented out, the time is around 40micro seconds. This clearly isn't possible. This would mean doing 1100 math ops in less than 400 clock cycles. However with the increment operator in the for loop takes around 3.2 mS. Can anyone explain to me what exactly is going on here? Does it just remember its previous answer and output it again?

Idk if it makes a difference but its running on Duemilanove.

//This program tests the time delay of various mathematical functions

int sum, diff, quot, prod, sqar, root, num, a, b, c, d, e;
unsigned long time, end_time;
void setup()
{
Serial.begin(9600);
}

void loop()
{
//Serial.print("Elapsed Time: ");
randomSeed(analogRead(A0)-millis());
num = random(1000,20000);

time = micros();
//performs 100 loops of 11 mathematical operations
for (int i=0; i<100; i++)
{
num++; //Increment required to prevent compiler from cheating.
sum = num+125;
diff = sum-130;
quot = diff/20;
prod = quot6;
sqar = prod
prod;
a = num+19;
b = num - 27;
c = b + a;
d = c/num;
e = d*c;
}
end_time = micros();

Serial.print(end_time-time);
Serial.println(" micro seconds");
delay(5000);
}

40 micro seconds
44 micro seconds
40 micro seconds

3244 micro seconds
3244 micro seconds
3248 micro seconds

The compiler can see that the expressions are constants, so doesn't need to keep calculating the same result.

A few "volatile"s should give more representative results.

(40microseconds would be closer to 640 cycles than to 400)

The randomSeed function should be called once, in setup, not every time through loop.

The problem with trying to obtain timing by measuring operations in a loop is that there is always some overhead that has nothing to do with the operation you are trying to measure.

One way to eliminate loop overhead is to do two loops: One loop performs the operation once each pass through the loop and the other performs the operation twice each pass. Subtract one time from the other and you have an idea of the time for the operation. Note that there is some extra time that may vary from loop to loop due to the fact that Timer 0 interrupts have to be serviced. As long as you know about it, it's nothing that should make you fraught.

Note also, that you don't actually have to measure the times, you can look at disassembled code and count cycles from that.

However...

It is sometimes "nice" (and educational) to approach the problem from two points of view. It's always "nice" when measured and calculated results agree, right?

I'll show what I mean it for addition and multiplication. I measure the time it takes to fetch two int operands, do the operation, and store the result. Using volatile variables tells the compiler not to optimize away any instructions in the sequence.

// Calculate time and number of machine cycles for addition and multiplication.
//
//
// With ATmega CPUs, times for arithmetic and logic operations doe not depend
// on the values of the operands.
// That is, an 8-bit addition always takes one machine cycle and an 8-bit
// multiplication always takes two machine cycles.
//
// This little example measures the time to perform an operation on
// two variables and store the result.
//
// For avr-gcc, you can look at the code (using avr-objdump -d) to see
// the following for the addition loops
// 
// A 16-bit operation requires the following number of cycles:
// Two load instructions (indirect mode) for each variable    8 cycles
// One 8-bit addition                                         1 cycle
// One 8-bit add-with-carry                                   1 cycle
// Two store instructions (indirect mode)                     4 cycles
//                                                           ---------
// Total                                                     14 cycles
//
//  (At least that is what I see from avr-gcc version 4.3.4 on my Linux system.)
//
// For additional enlightenment, look at the disassembled code for the
// multiplication loops.  (It turns out to be 23 cycles.)
//
// Anyhow, for people who aren't interested in manually counting cycles
// here's one way to let the machine do the work.
//
// Measurement is performed creating a loop that does one addition each
// time and another loop that does two additions each time.  By
// subtracting the first time from the second, the result is an indication
// of the time for the operation, since it eliminates the loop overhead.
//
// Multiplying the time for each instruction (in microcseconds) by
// the CPU speed in MHz gives the number of machine cycles required
// for the instruction.
//
// Note that the measured time in microseconds is actually a little bigger
// than nominal since the time to service Timer 0 interrupts adds
// a little to the execution time.  The extra is easily truncated to give
// an accurate count of the machine cycles
//
// davekw7x

// Note that if you want to use random values for operands instead
// of fixed values uncomment the following line.
//#define RANDOMIZE_THE_OPERANDS

void setup()
{

    Serial.begin(9600);
    delay(1000);
#ifdef RANDOMIZE_THE_OPERANDS
    Serial.print("Enter a character to start...");
    while (Serial.available() < 1)
        ;
    Serial.println();
    randomSeed(micros());
    Serial.println("Here we go!");
#endif
}

const float fcpuMHz = F_CPU / 1000000.0;

void loop()
{
    const int num = 10000;
    unsigned long t1, t2, tdiff1, tdiff2;
    float measured;
    int nCycles;

    volatile int localSum1;
    volatile int localSum2;
    volatile int localProd1;
    volatile int localProd2;  
    volatile int localX1, localX2;
    
#ifdef RANDOMIZE_THE_OPERANDS
    Serial.println("Variables are being randomized");
    localX1 = rand();
    localX2 = rand();
#else
    localX1 = 1234;
    localX2 = 5678;
#endif

    
    Serial.print("CPU Frequency = ");
    Serial.print(F_CPU);
    Serial.println(" Hz.");
    
    Serial.print("Number of times each instruction is executed = ");
    Serial.println(num);
    Serial.println();
    
    t1 = micros();
    for (int i = 0; i < num; i++) {
        localSum1 = localX1 + localX2;
    }
    t2 = micros();
    tdiff1 = t2 - t1;
    Serial.print("    For loop with one addition:  Elapsed time = ");
    Serial.print(tdiff1);
    Serial.println(" microseconds.");
    
   
    t1 = micros();
    for (int i = 0; i < num; i++) {
        localSum2 = localX1 + localX2;
        localSum2 = localX1 + localX2;
    }
    t2 = micros();
    tdiff2 = t2 - t1;
    Serial.print("    For loop with two additions: Elapsed time = ");
    Serial.print(tdiff2);
    Serial.println(" microseconds.");
    
    Serial.print("    Difference = ");
    Serial.print(tdiff2 - tdiff1);Serial.println(" microseconds.");
    
    // Note that measured value is a little more than the actual
    // value because servicing Timer 0 interrupts takes a small
    // amount of time.
    measured = (1.0 * (tdiff2 - tdiff1)) / num;
    Serial.print("    Time for int addition is something like ");
    Serial.print(measured, 4);
    Serial.println(" microseconds.");
    
    // Convert to an integer.  Note that it truncates the extra
    // fraction that was due to Timer 0 interrupts.
    nCycles = measured * fcpuMHz; // Will truncate to integer value
    Serial.print("    Number of machine cycles for an int addition is ");
    Serial.println(nCycles);

    Serial.println();
    
    t1 = micros();
    for (int i = 0; i < num; i++) {
        localProd1 = localX1 * localX2;
    }
    t2 = micros();
    
    tdiff1 = t2 - t1;
    Serial.print("    For loop with one multiplication:  Elapsed time = ");
    Serial.print(tdiff1);
    Serial.println(" microseconds.");
    
   
    t1 = micros();
    for (int i = 0; i < num; i++) {
        localProd2 = localX1 * localX2;
        localProd2 = localX1 * localX2;
    }
    t2 = micros();
    tdiff2 = t2 - t1;
    
    Serial.print("    For loop with two multiplications: Elapsed time = ");
    Serial.print(tdiff2);
    Serial.println(" microseconds");
    
    Serial.print("    Difference = ");
    Serial.print(tdiff2 - tdiff1);
    Serial.println(" microseconds");
    
    measured = (1.0 * (tdiff2 - tdiff1))/num;
    Serial.print("    Time for int multiplication is something like ");
    Serial.print(measured, 4);
    Serial.println(" microseconds.");
    
    nCycles = measured * fcpuMHz;
    Serial.print("    Number of machine cycles for an int multiplication is ");
    Serial.println(nCycles);
    Serial.println();

    delay(5000);
}

Output


CPU Frequency = 16000000 Hz.
Number of times each instruction is executed = 10000

    For loop with one addition:  Elapsed time = 13208 microseconds.
    For loop with two additions: Elapsed time = 22016 microseconds.
    Difference = 8808 microseconds.
    Time for int addition is something like 0.8808 microseconds.
    Number of machine cycles for an int addition is 14

    For loop with one multiplication:  Elapsed time = 18864 microseconds.
    For loop with two multiplications: Elapsed time = 33332 microseconds
    Difference = 14468 microseconds
    Time for int multiplication is something like 1.4468 microseconds.
    Number of machine cycles for an int multiplication is 23

CPU Frequency = 16000000 Hz.
Number of times each instruction is executed = 10000

    For loop with one addition:  Elapsed time = 13208 microseconds.
    For loop with two additions: Elapsed time = 22008 microseconds.
    Difference = 8800 microseconds.
    Time for int addition is something like 0.8800 microseconds.
    Number of machine cycles for an int addition is 14

    For loop with one multiplication:  Elapsed time = 18864 microseconds.
    For loop with two multiplications: Elapsed time = 33328 microseconds
    Difference = 14464 microseconds
    Time for int multiplication is something like 1.4464 microseconds.
    Number of machine cycles for an int multiplication is 23
.
.
.

Final note: The micros() function has a granularity of 4 That is, the values that it returns are always multiples of 4, so you can never get times to the nearest microsecond. Also, as I mentioned, minor variations in loop times (due to Timer 0 interrupt servicing) can show variations of a a count or two (4-8 microseconds) in the measured times for different runs through the loops.

Regards,

Dave

Footnote:
Being aware of times for operations is a good starting point for any project. Sometimes it's important to have a "best case" condition before you start the actual code so that you can get some idea of whether it is impossible to meet timing requirements. Of course many other things in your style of coding affect performance, but at least you know where to start.

Dave, awesome post!