I would make a string of 12 parallel-in serial out shift registers.
Clock all inputs together to sample the pins, shift the 12 bytes in with SPI.transfer & act on them.
for example:
// ignoring the declarations & void setup stuff for now ...
void loop(){
if (millis() >=previousMillis({ // see if sample time elapsed
previousMillis = previousMillis + 10; // set up for next sample time, 10mS in the future
digitalWrite (shiftReg_slaveSelect, LOW);
digitalWrite (shiftReg_slaveSelect, HIGH; // make a clock pulse to capture the data on the parallel inputs
for (x=0; x<12; x=x+1){ //loop to read in the shift registers
shiftReg[x] = SPI.transfer(); //bring in the data, put it in an array - might need 1 false transfer before see the real data
}
// now have 12 bytes in the shiftReg[] array, process them
for (arrayLevel = 0; arrayLevel <8; arrayLevel = arrayLevel +1){ // go thru the 12 bytes
for (byteBit = 1; byteBit <8; byteBit = byteBit <1){ // go thru each byte, mask off each bit
if (shiftReg[arrayLevel] & byteBit) !=1){ // example: shiftReg[2] = B00110110, byteBit is B00000100, then result is B00000100
//send midinote for arrayLevel * 8 + byteAdder[byteBit]
// where byteAdder[] = {7,6,5,4,3,2,1,0}; is previously defined
// thus (0 to 11) * 8 + (0-7) represents all 96 notes
} // end processing each note
} // end cycling thru 8 bits
} // end cycling thru 12 registers
} // end void loop
When the 12 bytes are sampled, I would use that same clock to latch the keypress status into 12 octal latches to drive the LEDs.