What's going on?

What's going on?

I recently purchased a nice new digital scope, and thought I'd look at an arduino Nano's speed performance.

Here's the code - just toggling a pin in various ways.

//1: newfile1,2,3.png


void loop() {
  while (1) {
    PORTD |= 0x20;
    //digitalWrite( pulsepin , HIGH);
    PORTD &= 0xcf;
    // digitalWrite( pulsepin , LOW);


  }

}


//2:newfile4.png

void loop() {
  while (1) {

    //digitalWrite( pulsepin , HIGH);
    PORTD &= 0xcf;
    // digitalWrite( pulsepin , LOW);

    PORTD |= 0x20;
  }

}



//3:newfile5.png


void loop() {
//  while (1) {

    //digitalWrite( pulsepin , HIGH);
    PORTD &= 0xcf;
    // digitalWrite( pulsepin , LOW);
    PORTD |= 0x20;
//  }

}


//4:  newfile6.png

void loop() {
  //  while (1) {
  PORTD |= 0x20;
  //digitalWrite( pulsepin , HIGH);
  PORTD &= 0xcf;
  //digitalWrite( pulsepin , LOW);

  //  }

}



//5:  newfile7.png

void loop() {
//  while (1) {
//    PORTD |= 0x20;
    digitalWrite( pulsepin , HIGH);
//    PORTD &= 0xcf;
    digitalWrite( pulsepin , LOW);

//  }

}

Note 5 versions . They vary in that they use direct port writes vs digitalWrite, and in the embedding of a while(1) inside the normal loop{} to test that's performance.

The scope pictures are appended.

Some things become clear.

DigitalWrite is incredibly slow. 6-8uS to write to a pin . That's around 100 instructions at 16MHz ( I checked that too - 15.973MHz - pretty good)

Why?

The loop{} seems to take about 6 uS to go round - why? - it's only a jump to a label.

The while(1) is reasonably ok - the test for 'one' seems to take 1 instruction.

There's a continuous processor stop every millisecond or so - presumably to corect the millis() count. But it takes again about 8uS - it's simple interrupt incrementing a counter - that doesn't take 100 instructions!

Just curious - and can I turn the millis() stuff off for time critical stuff which doesn't need it?

Allan

We are aware that digitalWrite() is slow - that seems a reasonable price for its convenience.

It seems to me the detailed explanations for that and for your other questions lie in a study of the Arduino source code which is included with the IDE.

A lot of the Arduino system is focused on convenience rather than absolute performance.

I presume you can modify the source code to prevent millis() from working if that is necessary.

...R

What's going on?

Why don't you look at the source?
Why don't you look at the compiled code?

Fair points - I'm used to dealing with 'bare' processors IDE's.

I'll tweak the source if I ever need time critical stuff - in particular where predictable accurate timing is required.

Allan

Try adding this to the one without the while() loop:

void serialEventRun() {}

There's normally overhead in the main() loop for the call to serialEventRun(), which handles the serialEvent() feature. By defining an empty serialEventRun() that overhead is removed. The only disadvantage is the loss of the serialEvent() feature but most people don't even use that.

When using older versions of Arduino AVR Boards there is also overhead for the setup() and loop() calls so you can speed it up by defining your own main() but with the recent versions there's no longer overhead for those function calls.

DigitalWrite is incredibly slow.
Why?

Most of the low speed of digitalWrite() is due to the requirement that it handle a variable value for both the pin number and the value. The "PORTD |= 0x8" form compiles into a single instruction, but it's one that had the port and bit both built-in to the the instruction. As soon as you want to do " |= " you're looking at perhaps 10 instructions. By the time you add translating "pin no" to port/bit and HIGH/LOW to a bit value, and dealing with the timer or A2D that might be in use on the pin, it amounts up...

The loop{} seems to take about 6 uS to go round - why? - it's only a jump to a label.

Not any more. In between loops, the code calls SerialEventRun(). (6us sounds a bit excessive, though. serialEventRun shouldn't do much if you haven't defined serialEvent()

There's a continuous processor stop every millisecond or so - presumably to corect the millis() count. But it takes again about 8uS - it's simple interrupt incrementing a counter - that doesn't take 100 instructions!

Alas, it increments two counters. Two 32bit counters. So, load 4 bytes, add to each byte, store 4 bytes, times 2. That's 40 cycles. Plus the save/restore of ~8 registers. Plus logic to handle the extra 24us. The timer overflow ISR is ... just about 50 instructions long, and many of them are 2-cycle instructions.
Could it be better? Some, probably. You could do the 32bit adds one at a time, and probably get rid of at least 4 of the register save/restores (16 cycles.) Really not worth the loss in clarity, though... (I DID get rid of the second 32bit counter, but it got added back in for compatibility reasons. Or something. Sigh.)

ISR(TIMER0_OVF_vect)
{
 45a:   1f 92           push    r1
 45c:   0f 92           push    r0
 45e:   0f b6           in      r0, 0x3f  ;;; status
 460:   0f 92           push    r0
 462:   11 24           eor     r1, r1    ;;; known zero register
 464:   2f 93           push    r18
 466:   3f 93           push    r19
 468:   8f 93           push    r24
 46a:   9f 93           push    r25
 46c:   af 93           push    r26
 46e:   bf 93           push    r27
        unsigned long m = timer0_millis;
 470:   80 91 1d 01     lds     r24, 0x011D     ; 0x80011d <timer0_millis>
 474:   90 91 1e 01     lds     r25, 0x011E     ; 0x80011e <timer0_millis+0x1>
 478:   a0 91 1f 01     lds     r26, 0x011F     ; 0x80011f <timer0_millis+0x2>
 47c:   b0 91 20 01     lds     r27, 0x0120     ; 0x800120 <timer0_millis+0x3>
        unsigned char f = timer0_fract;
 480:   30 91 1c 01     lds     r19, 0x011C     ; 0x80011c timer0_fract
        m += MILLIS_INC;
        f += FRACT_INC;
 484:   23 e0           ldi     r18, 0x03
 486:   23 0f           add     r18, r19
        if (f >= FRACT_MAX) {
 488:   2d 37           cpi     r18, 0x7D
 48a:   20 f4           brcc    .+8             ; 0x494 <__vector_16+0x3a>
        m += MILLIS_INC;
 48c:   01 96           adiw    r24, 0x01
 48e:   a1 1d           adc     r26, r1
 490:   b1 1d           adc     r27, r1
 492:   05 c0           rjmp    .+10            ; 0x49e <__vector_16+0x44>
        f += FRACT_INC;
        if (f >= FRACT_MAX) {
                f -= FRACT_MAX;
 494:   26 e8           ldi     r18, 0x86
 496:   23 0f           add     r18, r19
                m += 1;
 498:   02 96           adiw    r24, 0x02
 49a:   a1 1d           adc     r26, r1
 49c:   b1 1d           adc     r27, r1
        }
        timer0_fract = f;
 49e:   20 93 1c 01     sts     0x011C, r18     ; 0x80011c timer0_fract
        timer0_millis = m;
 4a2:   80 93 1d 01     sts     0x011D, r24     ; 0x80011d <timer0_millis>
 4a6:   90 93 1e 01     sts     0x011E, r25     ; 0x80011e <timer0_millis+0x1>
 4aa:   a0 93 1f 01     sts     0x011F, r26     ; 0x80011f <timer0_millis+0x2>
 4ae:   b0 93 20 01     sts     0x0120, r27     ; 0x800120 <timer0_millis+0x3>
        timer0_overflow_count++;
 4b2:   80 91 21 01     lds     r24, 0x0121     ; 0x800121 <timer0_overflow_count>
 4b6:   90 91 22 01     lds     r25, 0x0122     ; 0x800122 <timer0_overflow_count+0x1>
 4ba:   a0 91 23 01     lds     r26, 0x0123     ; 0x800123 <timer0_overflow_count+0x2>
 4be:   b0 91 24 01     lds     r27, 0x0124     ; 0x800124 <timer0_overflow_count+0x3>
 4c2:   01 96           adiw    r24, 0x01       ; 1
 4c4:   a1 1d           adc     r26, r1
 4c6:   b1 1d           adc     r27, r1
 4c8:   80 93 21 01     sts     0x0121, r24     ; 0x800121 <timer0_overflow_count>
 4cc:   90 93 22 01     sts     0x0122, r25     ; 0x800122 <timer0_overflow_count+0x1>
 4d0:   a0 93 23 01     sts     0x0123, r26     ; 0x800123 <timer0_overflow_count+0x2>
 4d4:   b0 93 24 01     sts     0x0124, r27     ; 0x800124 <timer0_overflow_count+0x3>
}
 4d8:   bf 91           pop     r27
 4da:   af 91           pop     r26
 4dc:   9f 91           pop     r25
 4de:   8f 91           pop     r24
 4e0:   3f 91           pop     r19
 4e2:   2f 91           pop     r18
 4e4:   0f 90           pop     r0
 4e6:   0f be           out     0x3f, r0
 4e8:   0f 90           pop     r0
 4ea:   1f 90           pop     r1
 4ec:   18 95           reti