Your code has two for loops and it can be time consuming. How about this:
static unsigned char col_starting=0; //starting column
spiTransfer(addr, 0, a[(col_starting + 0) & 0x07]);
spiTransfer(addr, 1, a[(col_starting + 1) & 0x07]);
spiTransfer(addr, 2, a[(col_starting + 2) & 0x07]);
...
spiTransfer(addr, 7, a[(col_starting + 7) & 0x07]);
col_starting = (col_starting==7)?0:(col_starting+1); //increment index
no loop but 8 transfers. you can modify it for your requirement.