Independent of SD libraries, I am interested in a replacement for HardwareSerial with the same API.
I have done enough code to know that I can produce a smaller library with all the functionality of HardwareSerial in 1.0 beta 4 and independently specify the size of every buffer at compile time. This is a transparent replacement.
I think I can make a version that is smaller than 1.0 beta 4 with run time sizes if the buffers are user defined static arrays.
I am working on a version that uses malloc/free and it look like it will be 200-500 flash bytes larger than 1.0 beta 4 if you are not using malloc/free for other purposes in your sketch.
I have made this library start in unbuffered mode. It starts with 0022 style output. It has just the three characters of hardware input buffering, the two level receive data register and the receive shift register. Overflow happens when the start bit for the fourth character is detected.
In this mode it is very small and works with most simple sketches that do limited input.
You can call a buffer allocation function to allocate rx and tx buffers and use interrupts. Calling this function causes malloc/free to be loaded. It looks like the total flash size will then be 200-500 bytes larger than 1.0 beta 4.
My SdFat library does not use malloc/free.
I really don't like using the heap in embedded systems except at system start-up.
I worked with critical systems where programming standards forbid using the heap after system start-up.
The Joint Strike Fighter standard is typical
http://www2.research.att.com/~bs/JSF-AV-rules.pdf.
Fragmentation of the heap can cause a stack overflow crash with lots of free memory in small chunks in the heap.