Timing question

Hello !

I have two different codes that read a 512 byte buffer from an SD card using SPI. The first one is mine and the second is taken from sdfatlib. They appear to be equivalent, however :

in the first one, the pause between SPI reads is 2.75 ?s :

for (uint16_t i = 0; i < 512; i++) {

    SPDR = 0xFF;
    while (!(SPSR & (1 << SPIF)));
    buffer[i] = SPDR;
}

and in the second one, time drops to 1.75 ?s, which means 8 less clock cycles on my Arduino Pro 8 MHz. That is some difference !

SPDR = 0xFF;

for (uint16_t i = 0; i < 511; i++) {
    
    while (!(SPSR & (1 << SPIF)));
    buffer[i] = SPDR;
    SPDR = 0xFF;
}

// wait for last byte

while (!(SPSR & (1 << SPIF)));
buffer[511] = SPDR;

Still, I can not explain the above difference, so I would really appreciate any help :slight_smile:

It's a good question, and I can reproduce your results. I checked the generated machine code and it contains the same instructions (albeit in a different order). So, we ask ourselves, "how come the same instructions take longer in one case than the other?".

I timed 6 clock cycles difference.

The answer is here:

    SPDR = 0xFF;   // <---- send
    while (!(SPSR & (1 << SPIF)));  // <---- immediately wait for buffer empty

Since the default SPI transfer rate is 1/4 of the system clock, we expect it to take 4 clock cycles to send that byte. So the while loop will wait for at least 4 cycles.

(edit) See below: At least 32 cycles (8 x 4).

Compare to:

for (uint16_t i = 0; i < 511; i++) {
    
    while (!(SPSR & (1 << SPIF)));
    buffer[i] = SPDR;
    SPDR = 0xFF;
}

Extra work is done here after setting SPDR, namely:

 11e:	82 e0       	ldi	r24, 0x02	; 2
 120:	ef 3f       	cpi	r30, 0xFF	; 255
 122:	f8 07       	cpc	r31, r24
 124:	b1 f7       	brne	.-20     	; 0x112 <loop+0x12>

In other words, the "end of loop" test. This is 5 cycles here. So the 5 cycles are being used during the SPI transfer. So when we go back to the start of the loop the transfer is over, basically.

This accounts for 5 cycles, not 6, but the difference is probably that the loop here:

    while (!(SPSR & (1 << SPIF)));

... would take at least 4 cycles, so the SPI transfer probably finishes half-way through that test.

My explanation above is a little incorrect. The SPI clock frequency is (by default) 1/4 of the system clock, but that is per bit.

So an SPI transfer will actually take 32 (4 x 8) clock cycles, not just 4.

However my conclusion stands, that of those 32 cycles, we have used up 5 in the "end of for loop" test, thereby accounting for a reduction of 5 cycles.

The 6th cycle would be accounted for by the fact that the test for the SPIF flag takes 4 cycles, so the test must conclude on a boundary of 4 cycles (so we won't save exactly 5 cycles).

Dear Nick, your explanation makes perfect sense and I thank you very much indeed :slight_smile:

just noting that I've set the SPI frequency to the highest possible, i.e 1/2 system frequency. However, the difference lies only on the fact that the while command takes advantage of the for-loop overhead.

Thanks a lot again !