Argh! Couldn't they have spared the handful dozen gates for a one-byte FIFO for that SPI interface?
In my target system, the controller decodes IR input on a wide range of bands (16 kHz through 60 kHz carriers), and sends out the serial port.
At the same time, it needs to also receive commands from a serial port (where I can control the command rate, so I can manage the interrupt load) and hard-wired buttons.
While sending commands with high precision, I can let the hardware serial back up, and delay processing button presses (but I'll probably want to OR together the input bits received during the time.)
The end result is automation of a number of different IR remote control protocols.
And it's funny you should mention high-speed PCs -- ten to fifteen years ago, I spent years working on an operating system that drove interrupt latencies on then-standard PC hardware with a general-purpose GUI down below milliseconds (for media production.) Even with modern hardware, neither Windows, nor Linux, nor MacOS will get to those levels. That's because they do many things at once, and use "cheapest possible" design instead of dedicated circuits for many things.
In effect, I want to use a microcontroller as the dedicated circuit for what I want to do. If that SPI interface had at least one byte of FIFO, then I probably could do it just fine (mashing in another output byte when the FIFO runs dry) but as it is, I have to take a 4 us interrupt every 8 bits, which means that every 8th bit I send out essentially gets extended by 4 us or so.
The reason I have such strict tolerances is because I need to generate the carrier wave for the IR modulation, at between 16 and 60 kHz. If I want to stick with an Atmega328P, I may have to use a separately programmable timer to generate the carrier, and then use the Arduino only as gate for that carrier (it's all Manchester coded AM -- at least I don't have to do FM in software
So, a 555, with a variable timing resistor, might get me there. But then the external circuitry is looking a lot hairier, and maybe I should go with some of the bigger boysthat have DMA to SPI for seamless modulation.
I can build a state machine to do exactly what I need, and count cycles from interrupt handlers to figure out what my budget is -- for example, I can bang a byte to the SPI, then enable interrupts, then immediately disable interrupts; as long as the longest interrupt handler is shorter than the time to send one byte out the SPI, I'm good. With 4 us pulse width (not ideal), this means < 30 us interrupt latency, which can be done on the current board. However, that's 4 us pulse width, not 1 us. With a device that runs faster, and has better circuitry for generating the pulse forms I care about (DMA, say), my target would probably be easier to reach.
Maybe the solution really is a LPC1768 for communications and smarts, and a 328P that just does pulse generation, using SPI for receive (which has a one-byte buffer).