Fast digital I/O and software SPI with C++ templates

Hi fat16lib,

I have used your code and I have had great success.

I was initially trying to re-create the fastDigitalWrite macros as a template library so I could add in extra features. But I hit a wall as no matter how much I can write, I really don't know that much about arduinos internals, that's where your code comes in.

My original test code is a st7920 128x64 library running in 8-bit parallel, clearing and setting the entire screen in a loop.

3666 bytes using digital write ( shiftOut ) to control shift registers 0.7 ~ 0.9 frames per second.
3450 bytes using digital write through my parallel interface: 1.2 ~ 1.4 frames per second.

I have only implemented the fast specialisations into my shift out class.

3666 down to 3152 bytes once your fast IO was implemented,
2562 bytes Once I fully specialised my shift library.
slightly more than 6 frames per second.

I imagine the parallel interface would be exceptionally faster.

Your code has been designed in such a way that I can implement my ideas straight into it.
I'm in the middle of adding a 2,3,4,5,6,7 and 8 pin specialisations that will take more than 1 pin ( up to 8 ) and write the pins on the same port together minimising the total number of write operations needed. This would be another huge increase in speed for my library as it can interface like pins.

Your code at the bottom of my hardware access system has already given me a 5 to 7 times increase in speed. And I hope to squeeze even more out of it.

Great Stuff!