Just thinking off the top of my head here, this code should be a lot faster, it only implements the shift reg part not LOAD which AFAICT did nothing yet anyway as there was no code to output to the parallel pins. With no outputs to the parallel pins how are you testing this?
// S88 pins setup
#define S88DATAIN 5
#define S88DATAOUT 4
#define S88CLOCK 2
#define S88LOAD 3
#define N_REGS 3
#define N_BITS (N_REGS * 8)
uint16_t S88DataOutShiftRegister[N_BITS];
int rd_index = 0;
int wr_index = 0;
void setup() {
attachInterrupt(0, CLOCK, CHANGE); // Interrupt stuff for the CLOCK and LOAD
attachInterrupt(1, LOAD, RISING);
pinMode(S88DATAOUT, OUTPUT);
for (int i = 0; i < N_BITS; i++) {
S88DataOutShiftRegister[i] = 0;
}
}
void loop() { // No loop stuff because it's all interrupt driven.
}
void CLOCK() {
byte dataBit = (PIND >> S88DATAIN) && 1;
byte clockBit = (PIND >> S88CLOCK) && 1;
if (clockBit) { // If rising edge, write top FIFO bit to S88DATAOUT
if (S88DataOutShiftRegister[rd_index++])
PORTD &= ~(1 << S88DATAOUT);
else {
PORTD |= (1 << S88DATAOUT);
}
rd_index = rd_index < N_BITS ? rd_index : 0;
} else { // If falling edge, read S88DATAIN -> Put the data into the FIFO
S88DataOutShiftRegister[wr_index++] = dataBit;
wr_index = wr_index < N_BITS ? wr_index : 0;
}
}
void LOAD() // Setup the S88data arrays / shift registers... Now it's ready to be send!
{
}
I'm pretty sure it will be faster to represent the bits in a byte array, that way there is no shifting of them every time you get a new bit. And my making that array a FIFO there is nothing to do but update the read and write index variables.
I'm sure there is still much room for improvement, this is just a first pass.
Rob