speed test

Yep, that's how I debugged it.

As I said earlier, you don't need to drop into assembler, because: first, the generated C code is pretty good, and second, you are adding NOPs anyway.

Where am I dropping into assembler? I followed the example you provided a few replies ago. If there's something I can leave out let me know.

You said:

Used the unlooped asm version of SPI ...

However if you used my loop with the NOPs that should be fine.

That's what I've done:

SPDR = (testArray[fakestartPoint + 0]);nop; nop; nop; nop; nop; nop; nop; nop; nop; nop;nop; nop; nop;nop; nop;

41 times, takes 46uS.

To save a lot of repetition you should be able to do this:

    SPDR = testArray [0];
    
    for (x = 1; x < 41; x++)
      {
      nop; nop; nop; nop; nop; nop; nop;
      SPDR = testArray [x];
      }

    nop; nop; nop; nop; nop; nop; nop; nop; nop; nop;

There are less NOPs because the loop takes a bit of time to execute.

Thanks.
What I have is written, & tested, all set up for 324 rows. Gonna use a 1284 for a big SRAM array (41 bytes x 324 rows) that users can change, so code space will not be an issue.