pito,
I was able to try 16-bit frames. The write rate increased from 1776.44 KB/sec to 2013.34 KB/sec.
The overhead is increased since a byte swap is required. I form the 16-bit word to be sent like this:
uint16_t w = *src++ << 8;
w |= *src++;