Arduino 20MHz, Delay() accuracy

Hi Everyone,
I'm running Arduino Duemilanove-328 at 20MHz (with 20MHz crystal and AVR_FREQ = 20000000L) and testing following code to check delay accuracy

void setup() {                
  pinMode(13, OUTPUT);     
}
void loop() {
  digitalWrite(13, HIGH);   // set the LED on
  delay(100);              // wait for 100 mili second
  digitalWrite(13, LOW);    // set the LED off
  delay(1000);              // wait for a second
}

but delay is not accurate! 100ms delay is giving 106ms and 1s gives 1.17s 1.07s of actual delay.

Why is that?
How to rectify it?
Does wiring.c need any modification if one is using >16MHz frequency?

What's the analyzer? That looks cool.

@Coding Badly
Gone to bed with
#define FRACT_INC ((MICROSECONDS_PER_TIMER0_OVERFLOW % 1000) >> 3)
part and had sleepless night!

Agree that #define FRACT_INC ((MICROSECONDS_PER_TIMER0_OVERFLOW % 1000) >> 3) will result in integer for 16Mhz (or multiple of 8. ) and a fraction for 20Mhz, but is it possible to tweak this code so it can work with other frequencies too?
i.e. If MCU is running at 20MHz it will take (1/20) micro Second for one cycle(or instruction), so if we just use 20 instructions like "nop" which takes one cycle to complete then we can create 1 micro Second delay.
Similarly;
16 instructions to get 1 micro Second delay for 16MHz
18 instructions to get 1 micro Second delay for 18MHz
19 instructions to get 1 micro Second delay for 19MHz and so on..

Or am I missing something here?

SirNickity:
What's the analyzer? That looks cool.

8)

DirtyBits:
Agree that #define FRACT_INC ((MICROSECONDS_PER_TIMER0_OVERFLOW % 1000) >> 3) will result in integer for 16Mhz (or multiple of 8. ) and a fraction for 20Mhz, but is it possible to tweak this code so it can work with other frequencies too?

Tweak which code? The original code or the alternative I proposed?

i.e. If MCU is running at 20MHz it will take (1/20) micro Second for one cycle(or instruction), so if we just use 20 instructions like "nop" which takes one cycle to complete then we can create 1 micro Second delay.

That's a good way to approach delayMicroseconds but it doesn't help with millis.

The topic subject and initial code posted refers to delay() and this is not directly related to millis(). Delay uses the micros() function and so needs to be addressed separately to improve accuracy when running at 20MHz.