Sketches running slower on 0013?

I’ve been working on a RFID entry system and I’ve noticed something odd with Arduino-0013. It seems that the same code runs slower when built with 0013 vs. 0012 (or 0011). My application has an asynchronous processing loop to handle user feedback in the form of a status LED and a piezo buzzer while not blocking the main program execution. The thing that originally alerted me to the speed difference was that my status beeps (3 short beeps) played noticeably slower. So I made a simple test case that demonstrates the difference.

The following sketch simply blinks an LED on pin 13 by calling a “service” routine that manages the state of the LED. The main loop does nothing but call the service routine and then a delay(1). In my real program there are other service routines that are called as well.

byte ledPin =  13;    // led control pin
int ledDelay = 75;   // The duration of the on/off time
int  ledDelayCounter=0;
boolean ledState = false;

void setup()

void SetLedState(boolean currentState)

void ServiceLed()
  boolean done=false;
  if (ledDelayCounter <= 0) {
    ledState = ! ledState;
    ledDelayCounter = ledDelay;

void loop()

The LED will blink significantly slower if the code is built under 0013 (it’s really noticeable when you replace the LED with a beeper). We’re not talking a small difference here. 0013 seems to be blinking about 33% slower (or more).

So is the delay() function taking longer in 0013? Or is there something else going on? Changing the loop to have delay(3) in 0012 seems to match delay(1) in 0013.

I realize I can shift to a millis() based interval rather than using a count-down method, and I might do that at some point. But the real question is what changed between 0012 and 0013 that caused the speed difference?

For reference this was all built and tested using the OSX version of the IDE with a both an Arduino NG and Diecimila - both with ATmega 168’s at 16MHz.

(1) Sleep or delay routines can never be exact. The delay(n) routine will now ensure that at least n milliseconds passes; previously it was delaying up to n milliseconds but this is unlike the behavior of most similar routines on many computing platforms.

(2) I think the code is now being compiled with optimizations set to minimize the space used, not to focus on the speed of the machine instructions. In tight loops where you are generating tones, this may have a bearing that you can hear the difference, but for most programs it doesn't matter and minimizing RAM usage and PROGMEM usage is important. For loops where timing is tight, you can likely get back your runtime focus with compiler settings.

Thanks for the reply.

It looks like the main difference is your first point. Comparing the delay() source between 0012 and 0013 shows the slight difference of “<” vs. “<=”.


void delay(unsigned long ms)
      unsigned long start = millis();
      while (millis() - start < ms)


void delay(unsigned long ms)
      unsigned long start = millis();
      while (millis() - start <= ms)

This explains the (significant) difference I was seeing. I guess I’ll have to shift to an elapsed-time rather than a countdown method. At least then it won’t be affected by any other future timing changes.


I guess I'll have to shift to an elapsed-time rather than a countdown method. At least then it won't be affected by any other future timing changes.

Or use a more accurate delay function. delay(1) is not likely to ever be very accurate, given that it's based on an approximately 1ms interrupt. If you CAN use delay() here, you should be able to get better results with delayMicroseconds(1000) And it's no less a waste of cycles, either...

If you CAN use delay() here, you should be able to get better results with delayMicroseconds(1000)

I'm not sure if that would be a good idea in my case. The reason being that the delayMicroseconds() function disables interrupts for the entire duration of the delay. This differs from delay() in that interrupts are only disabled during the short calls to millis() inside the loop.

Another part of my application involves "listening" for RFID tags at 9600 baud with the hardware UART. Having interrupts disabled for a full millisecond might cause me to drop a bit or miss a start bit and loose an entire byte. Or maybe I'm misunderstanding something and the hardware UART wouldn't be affected by disabled interrupts?

I guess I could simply make a non-interrupt-blocking clone of delayMicroseconds() for my use since I'm not concerned if my timing is off by a microsecond or two.

The following excerpt is from page 16 of the Atmega 168 datasheet:

There are basically two types of interrupts. The first type is triggered by an event that sets the Interrupt Flag. For these interrupts, the Program Counter is vectored to the actual Interrupt Vec- tor in order to execute the interrupt handling routine, and hardware clears the corresponding Interrupt Flag. Interrupt Flags can also be cleared by writing a logic one to the flag bit position(s) to be cleared. If an interrupt condition occurs while the corresponding interrupt enable bit is cleared, the Interrupt Flag will be set and remembered until the interrupt is enabled, or the flag is cleared by software. Similarly, if one or more interrupt conditions occur while the Global Interrupt Enable bit is cleared, the corresponding Interrupt Flag(s) will be set and remembered until the Global Interrupt Enable bit is set, and will then be executed by order of priority.

What this means is that if the UART receives a byte while interrupts are disabled, the flag will remain set and the interrupt will execute as soon as interrupts are re-enabled. HOWEVER, you if the UART receives 2 bytes while interrupts are disabled, the first byte will be lost forever.

In your case, at 9600 baud a character can be received approximately every 940 microseconds. This means you risk losing data.

To solve this, I would just use two 500 microsecond delays, one at the beginning and one at the end of your loop. This will mean that there is at least two instructions between every "cli" command, allowing time for both the Timer0 interrupt and the UART RX interrupt to execute. (This is because the AVR always executes at least 1 line of the main program between executing interrupts.

Best of luck with your project.


P.S. I at least was very gratified to see the optimization improvements. I am seeing code size reductions of over 10%. (Of course, being able to change the optimization level being passed to avr-gcc would be a nice feature)

I agree with etracer about not using delayMicroseconds() as a “more accurate” delay(). My personal rule is never to use more than double digits for the former. LiquidCrystal::home() and LiquidCrystal::clear() use delayMicroseconds(2000), for example, and this makes them almost unusable in certain applications – like mine. :slight_smile:


Good point. I should probably change those to delay(2).

I agree – now that delay(2) guarantees >= 2ms. I speculated that the delayMicroseconds(2000) was inserted back when delay(2) didn’t quite cover it.