I think accessing from SRAM is faster. I made a 14,625 byte array in a '1284P (16K SRAM) and can read from it to send out via SPI to shift registers at a nearly 1uS/byte rate, I think 17 clocks/byte, so about 48uS for 45 bytes.
Had to do some tricks tho - no looping, but 45 lines of this:
SPDR = dataArray[startPoint+0]; then nop; 15 times
SPDR = dataArray[startPoint+1]; then nop; 15 times
SPDR = dataArray[startPoint+2]; then nop; 15 times
:
:
SPDR = dataArray[startPoint+44]; then nop; 15 times
Then startPoint was incremented by 325 for the next pass.
There's an entry you have to add to the top of the sketch so the nop gets processed as an assembly command.