@westfw: Sorry, I wasn't careful enough before. Even for me, % gets optimized to & even without typecasting, but the compiler then works with 16-bit words instead of 8-bits. Your suggestion works the best.
For the TX buffer, the following code works for me (for ATmega168). You can simply replace the serialWrite function in wiring_serial.c:
#define TX_BUFFER_SIZE 32
unsigned char tx_buffer[TX_BUFFER_SIZE];
unsigned char tx_buffer_head = 0;
volatile unsigned char tx_buffer_tail = 0;
SIGNAL(USART_UDRE_vect) {
// temporary tx_buffer_tail
// (to optimize for volatile, there are no interrupts inside an interrupt routine)
unsigned char tail = tx_buffer_tail;
// get a byte from the buffer
unsigned char c = tx_buffer[tail];
// send the byte
UDR0 = c;
// update tail position
tail ++;
tail %= TX_BUFFER_SIZE;
// if the buffer is empty, disable the interrupt
if (tail == tx_buffer_head) {
UCSR0B &= ~(1 << UDRIE0);
}
tx_buffer_tail = tail;
}
void myserialWrite(unsigned char c) {
if ((!(UCSR0A & (1 << UDRE0))) || (tx_buffer_head != tx_buffer_tail)) {
// maybe checking if buffer is empty is not necessary,
// not sure if there can be a state when the data register empty flag is set
// and read here without the interrupt being executed
// well, it shouldn't happen, right?
// data register is not empty, use the buffer
unsigned char i = tx_buffer_head + 1;
i %= TX_BUFFER_SIZE;
// wait until there's a space in the buffer
while (i == tx_buffer_tail) ;
tx_buffer[tx_buffer_head] = c;
tx_buffer_head = i;
// enable the Data Register Empty Interrupt
UCSR0B |= (1 << UDRIE0);
}
else {
// no need to wait
UDR0 = c;
}
}
It is useful for lower baud rates (lower than 1Mbaud ;), because the interrupt itself takes 4us to execute, while with 1Mbaud it takes only 10us to send or receive a byte. With 1Mbaud, there might be an interference of the RX and TX interrupts causing the receiver to miss incoming bytes.