How can i test read speed?

i'm wanting to test digital read speed of arduino in two scenarios...

  1. using the built-in like x=digitalRead(pin)
  2. using direct port manipulation like x=PORTC0

I can obviously test output very easily.. have it output on a pin, then put a scope on it and measure the cycle period.

Anyone know a logical way i can test input speed? I would like to have my measurment accuracy down to the microsecond if possible.

You can do it if you already know how fast you can digitalWrite()...

void loop()
{
  digitalWrite(pin, LOW);
  (void)digitalRead(readpin);
  digitalWrite(pin, HIGH);
}

The difference between the time that the output pin stays high vs low should be very close to the time it take for digitalRead() to happen.

Let us know your results!

Bravo! Makes perfect sense. Why didn't i think of that :slight_smile:

I'll let you know.

How about using an interrupt to increment a counter instead of polling the pin via digitalRead() ?

problem is that you're not taking the digitwalwrite delay in account there(the low part), so it wouldn't actually be close to digitalread but it has the digitalwrite latency involved which in arduino is umfortunately quite high, because you don't actually know in which portion of the cycles inside the write the output change takes place, here's a good article about this: To use or not use digitalWrite | Bill Grundmann's Blog

westfw:
You can do it if you already know how fast you can digitalWrite()...

void loop()

{
  digitalWrite(pin, LOW);
  (void)digitalRead(readpin);
  digitalWrite(pin, HIGH);
}



The difference between the time that the output pin stays high vs low should be very close to the time it take for digitalRead() to happen.

Let us know your results!

Eliminateur:
problem is that you're not taking the digitwalwrite delay in account there

If you have a 'scope, you can try a sketch something like this:

 const int pin = 13;
void setup()
{
    pinMode(pin, OUTPUT);
}

void loop()
{
    digitalWrite(pin, HIGH);
    digitalWrite(pin, LOW);

    digitalWrite(pin, HIGH);
    digitalRead(pin);
    digitalWrite(pin, LOW);
    delay(1000);
}

You get a short pulse and a long pulse, repeated every second.

With the nominal 16 MHz clock on my board, I get about 3.8 microseconds for the short pulse width, indicating that the time from one digitalWrite to another is approximately 3.8 microseconds. That's the time to execute the digitalWrite function.

The long pulse width is about 7.6 microseconds, indicating that a call to digitalRead adds about 3.8 microseconds delay. So the digitalRead function also appears to take about 3.8 microseconds.

Don't have a 'scope? Well, how about using the Arduino micros() function to try to measure the time of a digitalRead?

Now, the time resolution of the micros() function is four microseconds (it can't report elapsed times less than four microseconds duration), so we make loops that repeat the writes (and reads) some number of times and measure the elapsed times through the loops. Again, subtract off the time from from a loop that does two writes from the time of a loop that does two writes and a read. Divide by the number of times through the loops to get the number of microseconds:

For example

void setup()
{
    Serial.begin(9600);
    pinMode(pin, OUTPUT);
}

void loop()
{
    int num = 10000;

    unsigned long t1, t2, t3, t4;
    t1 = micros();
    for (int i = 0; i < num; i++) {
        digitalWrite(pin, HIGH);
        digitalWrite(pin, LOW);
    }
    t2 = micros();
    
    // Put print stuff here if you want to see actual times
    
    t3 = micros();
    for (int i = 0; i < num; i++) {
        digitalWrite(pin, HIGH);
        digitalRead(pin);
        digitalWrite(pin, LOW);
    }
    t4 = micros();
 
    // Put print stuff here if you want to see actual times
    
    unsigned long timediff = t4 - t3 - (t2 - t1);
    Serial.print("digitalRead time is approximately ");
    Serial.print(timediff / (float)num, 2);
    Serial.println(" microseconds");
    delay(1000);
}

Output:


digitalRead time is approximately 3.71 microseconds

Regards,

Dave

Footnote:
I just 'eyeballed' the pulse widths with a 'scope to the nearest tenth of a microsecond rather than trying to use a timer/counter instrument for more precise measurement. Previous measurements with this particular board (an Arduino Duemilanove) indicated that the nominal 16 MHz clock was about 100 ppm high. So results should be "pretty close" to nominal. The bottom line is my free and frequent use of the word "approximate."
I
Maybe someone (someone else, that is) can go into code for the various functions and actually count CPU cycles of the generated machine code...

I use this macro for exactly this type of thing.

#define PULSE	asm("sbi 5,0\nnop\nnop\nnop\nnop\ncbi 5,0");

There's almost no latency to worry about here.

Obviously you have to modify the 5,0 part to match the port and pin you want to use.

The nops are just to stretch the pulse a little so it's easier to see.


Rob

void loop()
{  digitalWrite(pin, LOW);
  (void)digitalRead(readpin);
  digitalWrite(pin, HIGH); }

you're not taking the digitwalwrite delay in account

Sure I am. The code unrolls to

 digitalWrite(pin, HIGH);  // from end of loop.
 digitalWrite(pin, LOW);
 digitalRead(otherpin);
 digitalWrite(pin, HIGH);
   :

So pin stays HIGH for "digitalWrite overhead time" and goes low for "digitalWrite overhead time + digitalRead overhead time." (perhaps there's a factor of two on part of that.) All the info you need should be in the waveform displayed on a scope. (I originally thought of the two separate pulse idea, but decided it wasn't needed.)

digitalWrite has been extensively discussed (http://arduino.cc/forum/index.php/topic,4324.0.html ), though I don't think we ever went through an analysis of the machine code. (4.3 microseconds sounds about right, BTW.) The "problems" that make digitalRead/digitalWrite so much slower than direct port reads don't require machine code analysis to explain; they're all pretty obvious at the source code level. Abstraction of "pin"; abstraction of "value"; both are variables rather than constants. It makes sense that read is about the same speed as write; almost all of the time is spent doing things that they have in common.

Note that the development team has been seriously looking at implementing some of the suggestions for speeding up these functions in the cases where the arguments are constant. I'm not sure what will actually happen; for each opinion that the speedup is important, there is a conflicting opinion that predictable behavior is more important.

Back to the original question of how to test the speed of a digitalRead function and compare it with something faster like direct manipulation of port bits.

Looking at things on a 'scope is "easy" according to the Original Poster (although there might be some question about whether the observed pulse width includes loop overhead and so forth), but, in my opinion, the way to compare two operations is to use two timed loops and let the program tell you.

Now as far as a digital read statement, we really don't want to know just what the execution time is for the function, but what the execution time is for getting the value and storing it, right? And, in particular, we want to compare the speed of the assignment using some kind of "fast" method with one that uses the Arduino user-friendly-but-less-than-fast digitalRead function.

//
// A program to compare execution time for a "faster" pin
// reading function with the more convenient Arduino 
// digitalRead function.
//
// davekw7x
//

//
// If you are really interested a "faster" way,
// don't shift, but do an "and" operation in-place.  The
// return value will be zero or non-zero, depending
// on the state.  In many cases, this is all you need
// to be able to use it in your program.
//
// Anyhow, for now, I'll just compare execution time
// for the exact replacement for digitalRead
//

//
// I make a macro here just so it won't clutter
// up the code in the loop.  If you want
// to use some other pin, then you have to look
// up the port number and pin number to see what
// to define here.
// Arduino pin 14 is pin 6 on Port B
// Here's how to get a value (0 or 1) that
// shows the state of pin 6 on Port B
//
#define readPin14 ((PINB >> PINB6) & 1)

void setup()
{
    Serial.begin(9600);
    delay(1000);   
}

void loop()
{
    volatile byte x, y;
    unsigned long num = 100000L;

    unsigned long t1, t2, t3, t4;
    t1 = micros();
    for (unsigned long i = 0; i < num; i++) {
        x = readPin14;
    }
    t2 = micros();
    
    // Put print stuff here if you want to see actual times
    
    t3 = micros();
    for (unsigned long i = 0; i < num; i++) {
        x = digitalRead(14);
    }
    t4 = micros();
 
    unsigned long loop1Time = t2 - t1;
    unsigned long loop2Time = t4 - t3;
    
    // Put print stuff here if you want to see actual times.  Note
    // that these times include loop overhead and the time to call micros().
    Serial.print("Loop 1 time = ");Serial.print(loop1Time);Serial.println(" usec");
    Serial.print("Loop 2 time = ");Serial.print(loop2Time);Serial.println(" usec");
    
    // This is what we are really after
    unsigned long diffTime = loop2Time - loop1Time;
    Serial.print("Difference between direct pin manipulation and digitalRead is approximately ");
    Serial.print(diffTime / (float)num);
    Serial.println(" usec"); Serial.println();
    Serial.print("(Ignore this junk: ");
    Serial.print(x, DEC);Serial.print(y, DEC);Serial.println(")");
    Serial.println();
 
    delay(1000);
}

Output:


[color=blue]Loop 1 time = 301804 usec
Loop 2 time = 471580 usec
Difference between direct pin manipulation and digitalRead is approximately 1.70 usec[/color]

In general: If we put some stuff in loop1 and some other stuff in loop 2, this shows how to compute the difference in execution time in microseconds.

So, for example, if we want to know the absolute execution time of "something," put "something" in loop 1 and put two "somethings" in loop 2. Then let the program show the difference.

Warning: Beware of compiler optimization. Here are a couple of hints:

  • If you just do something like an assignment statement in the loops and if you don't do anything with the values stored, the compiler may (or may not) optimize away the entire loop. (So do a junky print or some such thing after the loops.)

Furthermore...

  • Function calls will not be optimized away, but if you just store something in a variable in a loop and print its value after the loop, the loop may (or may not) be optimized down to one execution unless the variable is declared volatile (or unless it is extern).

A final note:
This program does not take the execution time of Timer 0 interrupt servicing into account. The actual ISR doesn't happen often enough to skew the results in any meaningful way. Or so I claim. See Footnote

Regards,

Dave

Footnote:
By my reckoning, for a 16 MHz Arduino board, the Timer 0 interrupt service routine takes something like 3 microseconds every millisecond or so, which I neglect in this simple program. (If my reckoning is incorrect, maybe someone can enlighten us all.)

Try this...

#define readPin14 (PINB & (1 << PINB6))

...or this...

#define readPin14 ( (PINB & (1 << PINB6)) ? 1 : 0 )

Oh, I see: ANDing the value with constants (worked out by the compiler at compile time) should be faster than shifting the value after reading it, right? That makes sense.

On the other hand...

A little extra time for the conditional (ternary) operator in your second form might very well make it a wash when compared with the form I showed. (Those sneaky optimizing compilers really make me scratch my head, and, in fact, hand-optimizing tricks that seem to have an advantage just very well might not apply to other compilers or even to different versions of the same compiler.) See Footnote.

Bottom line: It's easy enough to test the differences, right? Notice that actual times may also depend on which bit we are testing. (That dang compiler has ways of taking advantage of certain numerical characteristics that we might not even suspect. Right?) The question is: For a given bit position, is one form faster than the other?

Regards,

Dave

Footnote:
I used this kind of program (with avr-gcc 4.3.4 on my Linux workstation) to test your second form against the form that I showed. I tested for all bits 0 through 7. (Note that if you run through the loops a million times each and one loop appears to be faster by a few microseconds, keep running, and you may see that sometimes one is faster by a count of four or eight microseconds and other times the other is faster by a count of four or eight microseconds. One might speculate that this might be accounted for by slightly differing Timer 0 interrupt service calls for the different loops. Or maybe by the fact that the micros() clock actually ticks four microseconds at a time, and a couple of ticks of the clock for a million times through the loop is not significant. If the instruction times were really different by, say, one CPU clock time, a million times through the loop would show a difference of something over 6000 microseconds.)

Anyhow...

Anyone want to guess the results?

davekw7x:
Oh, I see: ANDing the value with constants (worked out by the compiler at compile time) should be faster than shifting the value after reading it, right?

Yes. It should reduce to a single machine instruction for most I/O pins. I believe the other pins reduce to three machine instructions.

A little extra time for the conditional (ternary) operator in your second form might very well make it a wash when compared with the form I showed.

Nope. The code is essentially identical. Both generate a variable loop / bit shift. (Which surprised me. I expected the ternary version to expand to the same code as the "if" example below.) The performance should be identical. Even this generates a variable loop...

#define readPin14 ((PINB & (1<<PINB6)) == (1<<PINB6))

This, however, generates nice efficient code...

if ( (PINB & (1<<PINB6)) ) { x = 1; } else { x = 0; }

(Those sneaky optimizing compilers really make me scratch my head

Yeah. I have a similar bald spot.

Notice that actual times may also depend on which bit we are testing.

Right. Most of the examples generate a variable loop. Essentially the code is this...

volatile uint8_t x;
x = PINB;
for ( uint8_t i = PINB6 /bit we want to test/; i > 0; --i )
x = x >> 1;
x = x & 1;

So, lower bit numbers are going to have better performance.

The question is: For a given bit position, is one form faster than the other?

Yes. This one has a constant run-time no matter the bit position or value...

#define readPin14 (PINB & (1<<PINB6))

This one has a constant run-time for all bit positions but may vary by a single machine instruction depending on the value...

if ( (PINB & (1<<PINB6)) ) { x = 1; } else { x = 0; }

The others all perform worse for higher bit numbers and should perform worse than the two above.

It's often useful to have a variable for the pin number and use digitalread/Write().

But also there are many times when the pins are hard coded.

Should there be a set of these macros for every pin?


Rob

Graynomad:
Should there be a set of these macros for every pin?

Like these?
http://code.google.com/p/digitalwritefast/
http://www.arduino.cc/cgi-bin/yabb2/YaBB.pl?num=1267553811

Yep, just like those :slight_smile:

Álvaro Lopes and Paul Stoffregen also have variations. Paul's version is in the Teensyduino core. I think Álvaro's version is somewhere on Github.

jrraines' version uses macros. I believe the other two use inline-functions.

I think Paul's version is highly-compatible with the 0022 core (e.g. PWM is automatically turned off). I can't remember if the other two are.