I saw today that the "twi.c" file, used to talk to the TWI interface of the Atmega, contains no less than three full-size buffers for TWI data packets. In addition, the "Wire.cpp" file used to wrap the twi.c file in a class interface adds another two buffers!
I think the buffers in Wire could pretty easily be eliminated by implementing the state machine from twi.c into the Wire.cpp class.
Additionally, the code says things like "I have to do this in this order, because I make take an interrupt at any time" -- while I'm sure that's true, it could also just turn off interrupts for the duration of the critical section, and possibly be a little clearer. Dunno if it would actually save any instructions, so that might be of secondary importance.