use & RX_BUFFER_MASK instead of % RX_BUFFER_SIZE in the whole file
:
The compiler compiles % RX_BUFFER_SIZE literally as a module operation (essentially division) for some reason, instead of the obvious optimized & (RX_BUFFER_SIZE - 1) (probably because int is a signed number).
The current scheme allows a user to redefine the buffer size if they're low on ram, without having to understand or worry about having to pick a power of two. I'm not sure it would be a good idea to lose that capability.
I'm very unimpressed that the compiler did not manage to optimize this, even with rx_buffer_head and rx_buffer_tail redfined to UNSIGNED char as per your first suggestion
I did manage to get it to generate andi instructions with some pretty heavy handed casting (that I don't like.) I want to do some more experiments with avoiding expressions that generate intermediate results that the compiler is unable to optimized as unsigned bytes.
unsigned char i = ((unsigned char)(rx_buffer_head + 1)) % RX_BUFFER_SIZE;
6f6: e0 91 08 01 lds r30, 0x0108
6fa: ef 5f subi r30, 0xFF ; 255
6fc: 9e 2f mov r25, r30
6fe: 9f 77 andi r25, 0x7F ; 127
700: e1 50 subi r30, 0x01 ; 1
Hmm. How about:
SIGNAL(SIG_UART_RECV)
#endif
{
6de: 1f 92 push r1
6e0: 0f 92 push r0
6e2: 0f b6 in r0, 0x3f ; 63
6e4: 0f 92 push r0
6e6: 11 24 eor r1, r1
:
#if defined(__AVR_ATmega168__)
unsigned char c = UDR0;
6f2: 20 91 c6 00 lds r18, 0x00C6
#else
unsigned char c = UDR;
#endif
register unsigned char newhead;
newhead = rx_buffer_head + 1;
6f6: e0 91 08 01 lds r30, 0x0108
6fa: ef 5f subi r30, 0xFF ; 255
newhead %= RX_BUFFER_SIZE;
6fc: 9e 2f mov r25, r30
6fe: 9f 77 andi r25, 0x7F ; 127
700: e1 50 subi r30, 0x01 ; 1
// if we should be storing the received character into the location
// just before the tail (meaning that the head would advance to the
// current location of the tail), we're about to overflow the buffer
// and so we don't write the character or advance the head.
if (newhead != rx_buffer_tail) {
702: 80 91 07 01 lds r24, 0x0107
706: 98 17 cp r25, r24
708: 31 f0 breq .+12 ; 0x716 <__vector_18+0x38>
rx_buffer[rx_buffer_head] = c;
70a: ff 27 eor r31, r31
70c: e3 5f subi r30, 0xF3 ; 243
70e: fe 4f sbci r31, 0xFE ; 254
710: 20 83 st Z, r18
rx_buffer_head = newhead;
712: 90 93 08 01 sts 0x0108, r25
:
720: 0f 90 pop r0
722: 0f be out 0x3f, r0 ; 63
724: 0f 90 pop r0
726: 1f 90 pop r1
728: 18 95 reti
Wish I understood WTF it's doing saving/restoring R0/R1; looks pretty pointless (makes one of them a known zero, that isn't used. Uses the other to save SREG, but it could have used one of the other registers it had to save anyway. (is this part of the calling sequence definition, that subroutines (except for ISR) can trash r0/r1? That'd make a little more sense, but they still oughta be optimized away!))