Serial latency (again)...on Mega 2560

I rencently ran into the same issue while developing a remote procedure call library. The 4 ms delay is due to the firmware on atmega8u2. It is based on a by now outdated example from the LUFA usb library. Basically it will only send serial data to PC every 4 ms or after receiving 96 bytes. You can upload new firmware to the chip by installing dfu-programmer or atmels flip. I did it on Uno, search google for mega. You need to make the 8u2 go into dfu-mode.

As for new firmware I haven't found any on the internet, so I tried out modifying the existing. By changing the original arduino-usbserial firmware to pass on serial data to the pc in each loop in the main function, I went from 244 remote function calls per second to 400 - 500. These are function calls with return data. Calling functions without return data the number of function calls correspond to the baud rate divided by data transmitted (in both original and modified firmware). But looking at just the amount of data, I was supposed to to 2000+ calls per second if there was no latency. So the situation had improved, but it still wasn't impressive.
The original arduino-usbserial is built against a LUFA library release from 2010, but the have been numerous releases since. Unfortunately the code doesn't build against the newer version right away. But the LUFA library contains its own example of a usb-serial converter which has evolved a bit since the one the arduino version was based on back in 2010.
I modified the LUFA example to work with the UNO board. This improved function calls to 700-900 per second. Increasing the baud rate makes function calls with less/more return have more similar performance, but they never get much above 900 per second. So there is latency somewhere I still can't get around.
The new firmwares have been acting a bit strange right after the com port is opened. When the connection is up it seems to go on without problems, but when calling remote functions immediately after opening the comm port I've been getting unexpected data or not getting expected data. Not sure what the problem is.

EDIT:
The weird behavour turned out to be due to noise on the line when resetting the main AVR (it is reset when opening the connection). I don't know why this didn't happen with the old firmware, as the code in this part is fairly similar. Probably the timing changed somewhere in LUFA. Anyways, checking the framing error flag of the USART before passing on data to USB solved the problem. I think this is good practice in any case.
The new firmware works great. When boosting the baud rate, remote function calls get very close to 1000/sec, typically 950.