SPI between 2 Arduinos one as master and one as slave

You don't need all this stuff:

  // Put SCK, MOSI, SS pins into output mode
  // also put SCK, MOSI into LOW state, and SS into HIGH state.
  // Then put SPI hardware into Master mode and turn SPI on
  pinMode(MOSI_PIN, OUTPUT);
  pinMode(MISO_PIN, INPUT);
  pinMode(SCK_PIN, OUTPUT);
  pinMode(SS_PIN, OUTPUT);
 
...

  SPCR = (1<<SPE)|(1<<MSTR)|(1<<SPR1);
  byte clr;
  clr=SPSR;
  clr=SPDR;

SPI.begin() does all that.

The fundamental thing you are doing wrong is that SPI.transfer (or if you must, if you do it yourself using the ports) transfers and receives at the same time. Hence it is just plain impossible to send and receive the same byte, delay or no delay.

You always have to send something, and then receive the response on the next transfer, eg.

SPI.transfer ('a');    // send something
byte x = SPI.transfer (0);  // get a response

Now you can send and receive lots of bytes, but you always have to be "out by one".