Faster Routine

So I saw at SPI reference that it is possible to use 3 different PINs as Output (4,10,52) . So what I thought: Is it possbile to write 1152bit to different pins from an array with 12-bit per int like this:

for(int i=0;i<48;i++){ //48*2*12 = 1152

SPI.transfer(4,array[i],sizeof(array[i])); //size is 12 bit
SPI.transfer(10,array[576+i],sizeof(array[i])); //size is 12 bit

}

and using the last SPI pin as a Clock like this:

for(int i = 0;i<Clockticks;i++)
{
SPI.transfer(52,01b;sizeof(2*bit)); //I don't know which is the right type I need the buffersize here
}

So just in theory is this possbile or better just use pmc or timer without isr for getting a simple square wave?
The most important thing would be the parallel data shift.
And when using the TurboSpi can I also use different pins?

EDIT: I already see the problem in the first one it is no real parallel data out. There might also be the option to put data out only on one pin but this needs to happen fast enough so maybe gonna use TurboSPI for this.