I posted a new version of fast I/O libraries with toggle() as DigitalPinBeta20120804.zip
http://code.google.com/p/beta-lib/downloads/list.
The libraries support standard 168/328 Arduino, Mega, Leonardo, Teensy, Teensy++, and Sanguino.
The DigitalPin class provides very fast inline functions. DigitalPin is a template class and pin numbers must be specified at compile time.
For 328 pins and low address Mega pins read(), toggle(), and write() execute in two cycles or 125 ns for a 16 MHz CPU.
The main member functions for the DigitalPin class are:
void config (bool mode, bool level);
void high ();
void low ();
void mode (bool pinMode);
bool read ();
void toggle ();
void write (bool value);
The library also contains these static inline functions similar to digitalRead()/digitalWrite(). Pin number must be a constant.
static bool fastDigitalRead (uint8_t pin);
static void fastDigitalToggle (uint8_t pin);
static void fastDigitalWrite (uint8_t pin, bool level);
static void fastPinConfig (uint8_t pin, bool mode, bool level);
static void fastPinMode (uint8_t pin, bool mode);
There is also a Software SPI class that runs at about 2 MHz. It is also a template class with compile time pin numbers and SPI mode. Modes 0 - 3 are supported MSB first. LSB first would be easy to implement.
The member functions are:
void begin ();
uint8_t receive ();
void send (uint8_t data);
uint8_t transfer (uint8_t txData);