Tinrik:
Am I doing something wrong here or is this just how it's supposed to be?
I find a time of 0.3 microseconds per access, using this code:
#include <avr/pgmspace.h>
#define LUT_SIZE 10
const byte lut[LUT_SIZE] PROGMEM = {0,1,2,3,4,5,6,7,8,9};
int nIters = 1000/LUT_SIZE;
unsigned long tStart, tEnd;
void setup() {
// put your setup code here, to run once:
Serial.begin(115200);
}
void loop() {
// put your main code here, to run repeatedly:
tStart = micros();
for(int i = 0; i < nIters; ++i)
{
pgm_read_byte(lut+0);
pgm_read_byte(lut+1);
pgm_read_byte(lut+2);
pgm_read_byte(lut+3);
pgm_read_byte(lut+4);
pgm_read_byte(lut+5);
pgm_read_byte(lut+6);
pgm_read_byte(lut+7);
pgm_read_byte(lut+8);
pgm_read_byte(lut+9);
}
tEnd = micros();
Serial.println((float)(tEnd - tStart)/(nIters*LUT_SIZE));
delay(1000);
}
0.3µs = 16*0.3 = 4.8 clock cycles.
Some time is used (even in my) for-loop incrementing the index and jumping to the first instruction in the loop.
What you are doing wrong is surely this division:
pgm_read_byte(lut + i%LUT_SIZE);
The operation "i%LUT_SIZE" is a division, and Atmega controllers do NOT HAVE ANY HARDWARE DIVISION, so the division is done in software only, which is VERY COSTLY when calculated in "clock cycles".
Many operations on Atmega controllers can be done in one clock cycle only, BUT NOT DIVIDING!
So if you want to do fast running code on Atmegas, AVOID DIVIDING!
I agree, you are timing the division there, not the program memory read. The datasheet states that the read from program memory takes one more clock cycle (and they should know). At 16 MHz that is 62.5 ns. You don't need to write code to prove it. If your code does not give you that result, look at your code. The division by 10 would be an obvious problem area.