I posted new fast I/O libraries as DigitalPinBeta20120804.zip Google Code Archive - Long-term storage for Google Code Project Hosting..
The libraries support standard 168/328 Arduino, Mega, Leonardo, Teensy, Teensy++, and Sanguino.
The DigitalPin class provides very fast inline functions. DigitalPin is a template class and pin numbers must be specified at compile time.
For 328 pins and low address Mega pins read(), toggle(), and write() execute in two cycles or 125 ns for a 16 MHz CPU. This is about thirty times faster than digitalWrite() which executes in about 4 usec.
The main member functions for the DigitalPin class are:
void config (bool mode, bool level);
void high ();
void low ();
void mode (bool pinMode);
bool read ();
void toggle ();
void write (bool value);
The library also contains these static inline functions similar to digitalRead()/digitalWrite(). Pin number must be a constant.
static bool fastDigitalRead (uint8_t pin);
static void fastDigitalToggle (uint8_t pin);
static void fastDigitalWrite (uint8_t pin, bool level);
static void fastPinConfig (uint8_t pin, bool mode, bool level);
static void fastPinMode (uint8_t pin, bool mode);
There is also a Software SPI class that runs at about 2 MHz. It is a template class with compile time pin numbers and SPI mode. Modes 0 - 3 are supported MSB first. LSB first would be easy to implement.
The member functions are:
void begin ();
uint8_t receive ();
void send (uint8_t data);
uint8_t transfer (uint8_t txData);
There is also a class, DigitalIO, with run-time pin numbers. It is much slower than the above DigitalPin class but is 3 - 4 times faster than the Arduino digitalRead()/digitalWrite() functions.