delayMicrosecond bugs

I was recently tinkering with some things and noticed a couple of issues with delayMicroseconds(). The first is a problem of calling it with an argument of 0 leading to a ~17mS delay. This is apparently well known and will be fixed at some point or documented on the site I suppose.

The real crux of my post is that delayMicroseconds() seems to be adding ~4uS of extra delay to every call. I understand about short delays not being exceedingly accurate, but no matter what I call delayMicroseconds() with, the delay is 4uS longer than asked. I tested with values up to 20uS and the delay is always off by 4uS.

I am using 1.0.3 on an Uno R3 board. I have verified my results with an oscilloscope and am confident of the problem. I can provide a sample sketch and some captures from the scope if necessary, but it should be easy for anyone to verify that has the proper test equipment.

How are you measuring the length of the delayMicroseconds() call with an oscilloscope? Are you using something like digitalWrite() before and after? Those calls to digitalWrite() might explain the extra 4 microseconds.

Try using direct port manipulation instead of digitalWrite:

byte x = 10; // delayMicroseconds amount
byte pin2 = 2; // test with D2, PORTD bit 2

void setup(){
pinMode (pin2, OUTPUT);
}

void loop(){
PORTD = PORTD & B11111011;  // clears D2
delayMicroseconds(x);  // your test value
PORTD = PORTD | B00000100; // sets D2 
delayMicroseconds(x);  // your test value
}

I did some tests, it seems to take that long to do the digitalWrite() calls, sure wasn't expecting that. The following code takes approximately 20uS to execute, that's expensive for toggling a pin.

  digitalWrite(OUTPin, HIGH);
  digitalWrite(OUTPin, LOW);
  digitalWrite(OUTPin, HIGH);
  digitalWrite(OUTPin, LOW);

Thanks, that explains that I suppose.

This runs in 250nS (4 clock cycles):

  PORTD = PORTD | B00001000; // sets D3 
  PORTD = PORTD & B11110111;  // clears D3

Much more like it. And sure enough, I can generate a 4MHz pulse rate on the pin. I'm a PIC person, can you tell me what that assembles into? Thanks.

I use direct port manipulation a lot with SPI for fast external interfacing.

PORTD = PORTD & B11111011;  // clears D2 - used for Slave Select for example
SPI.transfer(dataArray[x]);
PORTD = PORTD | B00000100; // sets D2 - used for Slave Select

For instance, I was able to shift out 41 bytes in 46uS - and doing it in a loop that runs every 58uS.

If you look at the code for digitalWrite, you'll see there are safeguards to put the hardware in the correct state for an output change.
Direct port manipulation does not.

The main reason that functions like digitalRead() and digitalWrite() are so much slower then direct port register control is that by 'abstracting' the arduino pin numbers the same sketch command will work on arduino boards that use different AVR chips types that have different port/pin mappings. So it allows for portability of code across the various arduino AVR board types, a desirable quality in most cases. Where one can't afford the slower pin manipulation then one is free to utilize the PORTx and PINx method as long as you are aware that the resulting sketch will not run properly on a mega board if written for a uno board or visa versa.

The choice lies with the programmer, you to choose and utilize the best method that your application requires.

Lefty

Thanks Lefty, I guess it's a mindset kinda thing. I've almost always been an assembly language programmer so I am fairly aware of "what goes on down there". Even when I used C on a micro, you only dreamed of floating point math and you didn't think about calling any kind of printf, sprintf etc.. I'm still getting caught by the level of abstraction that is used by the Arduino libraries to get this stuff to run on so much different hardware in a generic fashion. When I see things like digitalWrite(), my brain is thinking MACRO that dissolves into a couple of instructions, not a real function call that takes 80 cycles. I guess I'm old and kinda set in my ways. :wink:

CrossRoads:
I use direct port manipulation a lot with SPI for fast external interfacing.

PORTD = PORTD & B11111011;  // clears D2 - used for Slave Select for example

SPI.transfer(dataArray[x]);
PORTD = PORTD | B00000100; // sets D2 - used for Slave Select



For instance, I was able to shift out 41 bytes in 46uS - and doing it in a loop that runs every 58uS.

If you look at the code for digitalWrite, you'll see there are safeguards to put the hardware in the correct state for an output change.
Direct port manipulation does not.

Thanks Crossroads. That's pretty fast on the SPI, about 7MHz. 12uS to spare, what did you do with all that leftover time? :wink:

When I take time to think, it makes perfect sense that setting or clearing a pin would take that long using the library. Considering all that takes place, it's really not all that bad.

Yes, I used the 8 MHz SPI rate to send the data out.
Well, I plan to update those 41 bytes at a 2 KHz rate, so I have a little bit of time for other stuff betwee bursts 8)

@afremont

Issue is known, reported it + possible fix here - Google Code Archive - Long-term storage for Google Code Project Hosting. -

patch for wiring.c

void delayMicroseconds(unsigned int us)
{
	// calling avrlib's delay_us() function with low values (e.g. 1 or
	// 2 microseconds) gives delays longer than desired.
	//delay_us(us);
#if F_CPU >= 20000000L
	// for the 20 MHz clock on rare Arduino boards

	// for a one-microsecond delay, simply wait 2 cycle and return. The overhead
	// of the function call yields a delay of exactly a one microsecond.
	__asm__ __volatile__ (
		"nop" "\n\t"
		"nop"); //just waiting 2 cycle
	if (us <= 1)
		return;
	us--;

	// the following loop takes a 1/5 of a microsecond (4 cycles)
	// per iteration, so execute it five times for each microsecond of
	// delay requested.
	us = (us<<2) + us; // x5 us

	// account for the time taken in the preceeding commands.
	us -= 2;

#elif F_CPU >= 16000000L
	// for the 16 MHz clock on most Arduino boards

	// for a one-microsecond delay, simply return.  the overhead
	// of the function call yields a delay of approximately 1 1/8 us.
	
//  FIX
//	if (--us == 0)
//		return;
	if (us < 2) return;
	us--;

	// the following loop takes a quarter of a microsecond (4 cycles)
	// per iteration, so execute it four times for each microsecond of
	// delay requested.
	us <<= 2;

	// account for the time taken in the preceeding commands.
	us -= 2;
#else
	// for the 8 MHz internal clock on the ATmega168

	// for a one- or two-microsecond delay, simply return.  the overhead of
	// the function calls takes more than two microseconds.  can't just
	// subtract two, since us is unsigned; we'd overflow.
	
	
//	if (--us == 0)
//		return;
//	if (--us == 0)
//		return;
	if (us < 3) return;
	us -= 2;
	

	// the following loop takes half of a microsecond (4 cycles)
	// per iteration, so execute it twice for each microsecond of
	// delay requested.
	us <<= 1;
    
	// partially compensate for the time taken by the preceeding commands.
	// we can't subtract any more than this or we'd overflow w/ small delays.
	us--;
#endif

	// busy wait
	__asm__ __volatile__ (
		"1: sbiw %0,1" "\n\t" // 2 cycles
		"brne 1b" : "=w" (us) : "0" (us) // 2 cycles
	);
}

oops, repo has moved - bug + fix is reported here - delayMicroseconds(0) delays far longer than expected. [imported] · Issue #576 · arduino/Arduino · GitHub