# Inaccuracy of delayMicroseconds()

I have a timing-critical application, and I’m trying to use delayMicroseconds to control a pulse width. It works wonderfully at higher values, but at ~10 it’s off by a number of clock cycles.

``````#define STROBE_PORT PORTE
const int strobePin = 4;
unsigned int strobeTime = 10;

void setup()
{
pinMode(2, OUTPUT);
digitalWrite(2, LOW);
}

void loop()
{
if ( correctSerialInput )
{
powerPulse();
}
}

void powerPulse()
{
if ( strobeTime <= 10000 )
{
cli();
sbi(STROBE_PORT, strobePin);
delayMicroseconds(strobetime);
cbi(STROBE_PORT, strobePin);
sei();
}
}
``````

Pulse width: 9.675us

The critical code is everything between sbi() and cbi(), as that controls the width of the pulse. So I thought I’d take a look at the assembly code:

``````  sbi(STROBE_PORT, strobePin);
36:      74 9a             sbi      0x0e, 4      ; 14
delayMicroseconds(strobetime);
38:      c9 01             movw      r24, r18
3a:      0e 94 00 00       call      0      ; 0x0 <_Z10powerPulsev>
3a: R_AVR_CALL      delayMicroseconds
cbi(STROBE_PORT, strobePin);
3e:      74 98             cbi      0x0e, 4      ; 14
``````

There’s a movw and a call: 1 + 5 = 6. Now, taking a look at the function-

delayMicroseconds:

``````#if F_CPU >= 16000000L
// for the 16 MHz clock on most Arduino boards

// for a one-microsecond delay, simply return.  the overhead
// of the function call yields a delay of approximately 1 1/8 us.
if (--us == 0)
0:      01 97             sbiw      r24, 0x01      ; 1
2:      01 f0             breq      .+0            ; 0x4 <delayMicroseconds+0x4>
2: R_AVR_7_PCREL      .text.delayMicroseconds+0x12
return;

// the following loop takes a quarter of a microsecond (4 cycles)
// per iteration, so execute it four times for each microsecond of
// delay requested.
us <<= 2;
4:      88 0f             add      r24, r24
6:      99 1f             adc      r25, r25
8:      88 0f             add      r24, r24
a:      99 1f             adc      r25, r25

// account for the time taken in the preceeding commands.
us -= 2;
c:      02 97             sbiw      r24, 0x02      ; 2
// we can't subtract any more than this or we'd overflow w/ small delays.
us--;
#endif

// busy wait
__asm__ __volatile__ (
e:      01 97             sbiw      r24, 0x01      ; 1
10:      01 f4             brne      .+0            ; 0x12 <delayMicroseconds+0x12>
10: R_AVR_7_PCREL      .text.delayMicroseconds+0xe
12:      08 95             ret
``````

Instruction - cycles
sbiw - 2
breq - 1 (condition is false)
sbiw - 2
(the us–; is from a different #ifdef branch and isn’t actually compiled)

Once in the loop it starts subtracting from the us variable, but its changed since it was passed:
r25/r24 = 10
sbiw 1 → 9
us <<= 2 → 36
sbiw 2 → 34

For us = 34 to us = 2, sbiw takes 2 clock cycles, brne takes 2 (condition true).

33*4 = 132

In the last loop, us reaches 0, so brne only takes 1 cycle, plus a return: 2+1+5 = 8.

Summing up the cycles in delayMicroseconds: 9 + 132 + 8 = 149.
Add 6 from before: 155 clock cycles.

155/16 = 9.6875us which is approximately what I’m seeing.

Adding up all of the clock cycles involved with the function, you get:

5 + 9 + 4( 4(us-1) - 2 - 1 ) + 3 + 5

Simplifying yields: 16us - 6. This is an error of 6 clock cycles or 0.375us. At 10us, this is a -3.75% error which is too high for my application, and quite possibly for others.

A simple solution would be as follows:
Replace the “us -= 2;” with “us -= 1;”. This would reduce the error to -2 clock cycles, which could be remedied by adding 2 nop’s.

It might be wise to just indicate the 6 cycle error in the reference. There is still some overhead in setting up the function, and that can vary. Reducing the error to 0 means the setup overhead will cause an unavoidable positive error. If a user knows about the error, it would be easy to compensate by figuring out your setup overhead, and adding nop’s to bring it to 6 cycles.

a difference of exactly 5 cycles. What gives?