wiring_serial.c - some suggestions for the Serial

Huh. That's neat. The compiler/linker included with Arduino0013 eliminates C functions that aren't called, even if they're part of the same source code as C functions that ARE used, and apparently even with some level of indirection.

In the current case, my little serial test case didn't use serialRead, and neither HardwareSerial::read or serialRead show up in the object file. This held true for other functions too. For example, my sketch got 200 bytes shorter without the call to Serial.begin (which I THINK will leave the serial port at the 19200bps from the bootloader?)

Does anyone have a pointer to the specifics of how this dead function elimination works?

PS: Even with code that's fast enough, running async serial comm at 1Mbps does not strike me as a good idea...

I guess it wouldn't be a big deal to note next to
#define RX_BUFFER_SIZE 128 something like
// use only 16, 32, 64, 128 or 256 !!!
But I agree that % RX_BUFFER_SIZE is fool-proof.

To me, it seems to me that the compiler optimizes % to & when I use unsigned char instead of int. But writing it explicitly would force it.

Pushing and popping R0 and R1 from the stack is weird, especially when it doesn't do the same with the registers that it's actually using in the interrupt.

Why is 1Mbaud a bad idea? It works great for me. Obviously I'm not sending data to Arduino, but from Arduino to PC. I need a real time info about what it is doing, and that produces 50kB/s of data. That's easy to handle by my PC =) But I needed the Arduino's side to work to be able to decode short commands sent from PC. The little hack from my first post just did the trick. I think that everybody would benefit from a smaller faster core library.

To me, it seems to me that the compiler optimizes % to & when I use unsigned char instead of int. But writing it explicitly would force it.

It's rarely a good idea to play games like this: to pretend to know what the compiler will produce in machine-code for a given high-level code, and to then try to write high-level code that will produce a specific machine-code result. Sometimes, the compiler will be better than you.

Write what you intend from a functional point of view, and if you don't like the compiler's optimizations, then write in machine-code or improve the compiler. Source code is for the humans.

@halley: I cannot agree more. But I'm just proposing the simplest fix I can think of to make the core library routine around 5 times faster. For me, % gets compiled as & when I change int to unsigned char. So I would just keep it. But westfw had a problem with that and had to do some typecasting. If someone prefers to write it in assembly instead, sure, go ahead, but that'll be less accessible for even more people.

EDIT: So as westfw proposed, the code that appears to do exactly what is expected is:

134:            unsigned char i = ((unsigned char)(rx_buffer_head + 1)) % RX_BUFFER_SIZE;
+00000166:   91E00110    LDS       R30,0x0110     Load direct from data space
+00000168:   5FEF        SUBI      R30,0xFF       Subtract immediate
+00000169:   2F9E        MOV       R25,R30        Copy register
+0000016A:   779F        ANDI      R25,0x7F       Logical AND with immediate
+0000016B:   50E1        SUBI      R30,0x01       Subtract immediate

I like my second patch better (the one that avoids having the c compiler have to deal with intermediate results. Casts are bad, especially for this sort of purpose, as Halley says. Whereas C's "promote to integer" "issue" is reasonably well known.)

  unsigned char i = rx_buffer_head +1;
  i %= RX_BUFFER_SIZE;

It's interesting that Mekon didn't need anything more than the change to unsigned char; perhaps that's a 11 vs 13 difference? My initial experiments were with a v11 library.

Although, if we're going to change the serial library, I want an optional tx buffer too. It's obnoxious that Serial.print() is as slow as it is (and this isn't the code being slow, it's the wait-for-the-UART algorithm.)

@westfw: Sorry, I wasn't careful enough before. Even for me, % gets optimized to & even without typecasting, but the compiler then works with 16-bit words instead of 8-bits. Your suggestion works the best.

For the TX buffer, the following code works for me (for ATmega168). You can simply replace the serialWrite function in wiring_serial.c:

#define TX_BUFFER_SIZE 32

unsigned char tx_buffer[TX_BUFFER_SIZE];

unsigned char tx_buffer_head = 0;
volatile unsigned char tx_buffer_tail = 0;


SIGNAL(USART_UDRE_vect) {

  // temporary tx_buffer_tail 
  // (to optimize for volatile, there are no interrupts inside an interrupt routine)
  unsigned char tail = tx_buffer_tail;

  // get a byte from the buffer      
  unsigned char c = tx_buffer[tail];
  // send the byte
  UDR0 = c;

  // update tail position
  tail ++;
  tail %= TX_BUFFER_SIZE;

  // if the buffer is empty,  disable the interrupt
  if (tail == tx_buffer_head) {
    UCSR0B &=  ~(1 << UDRIE0);
  }

  tx_buffer_tail = tail;

}


void myserialWrite(unsigned char c) {

  if ((!(UCSR0A & (1 << UDRE0))) || (tx_buffer_head != tx_buffer_tail)) {
    // maybe checking if buffer is empty is not necessary, 
    // not sure if there can be a state when the data register empty flag is set
    // and read here without the interrupt being executed
    // well, it shouldn't happen, right?

    // data register is not empty, use the buffer
    unsigned char i = tx_buffer_head + 1;
    i %= TX_BUFFER_SIZE;

    // wait until there's a space in the buffer
    while (i == tx_buffer_tail) ;

    tx_buffer[tx_buffer_head] = c;
    tx_buffer_head = i;

    // enable the Data Register Empty Interrupt
    UCSR0B |=  (1 << UDRIE0);

  } 
  else {
    // no need to wait
    UDR0 = c;
  }
}

It is useful for lower baud rates (lower than 1Mbaud ;), because the interrupt itself takes 4us to execute, while with 1Mbaud it takes only 10us to send or receive a byte. With 1Mbaud, there might be an interference of the RX and TX interrupts causing the receiver to miss incoming bytes.

Thanks for the code; that'll be helpful.
I had in mind modifying wiring_serial.c so that it was "smart" and didn't take up any more space if you had "#define TX_BUFFER_SIZE 0", although I have mixed feelings about conditional compilation. (The compiler will nicely optimize away C code that can never execute:

if (TX_BUFFER_SIZE > 0) {
  // buffered output code that is not included if 
  //    TX_BUFFER_SIZE is a constant zero
} else {
  //  Unbuffered output code
}

But I don't think that works for the ISR (Hmm. Need to check, especially with the newer compiler/linker. Bet it won't eliminate even empty, never-called, ISR routines...)

For that, the preprocessor can be used. It is used heavily in wiring_serial.c anyway. Something like:

#define TX_BUFFERED

#if defined(TX_BUFFERED) 

// buffered code here

#define TX_BUFFER_SIZE 32

unsigned char tx_buffer[TX_BUFFER_SIZE];

unsigned char tx_buffer_head = 0;
volatile unsigned char tx_buffer_tail = 0;


SIGNAL(USART_UDRE_vect) {
//...
// buffered code, with all variables etc.

#else 

// the unbuffered code
// the original serialWrite()

#endif

and the user can turn on or off the buffering just by commenting out the line #define TX_BUFFERED.

Or even better

#define TX_BUFFER_SIZE 32

#if (TX_BUFFER_SIZE > 0) 

// buffered code here

#else 

// the unbuffered code
// the original serialWrite()

#endif

The advantage of the preprocessor is that also the unused interrupt routine doesn't get compiled if the buffer is not used.

Ok, here's the final optimized wiring_serial.c with output buffer that can be completely disabled by setting #define TX_BUFFER_SIZE 0:

http://www.math.ucla.edu/~npozar/arduino/wiring_serial.c

It allows for 1Mbaud transfers and saves a couple hundred bytes of space.

I tested it with ATmega168. I should work with ATmega8 too, it compiles, but I can't test it.

Hi!!!

I would know how to set 1Mbaud serial comunication in the arduino with this library... what should I have to do?

thanks

Ok, here's the final optimized wiring_serial.c with output buffer that can be completely disabled by setting #define TX_BUFFER_SIZE 0

I had earlier pondered how this sort of #define trick could set the RX_BUFFER_SIZE also, since it's fairly large for most of the sketches that we see on the boards, but necessary for real workhorse applications that require constant serial attention.

The downside to this situation is that it actually depends on the LIBRARY to be rebuilt when the SKETCH uses it. It's no longer a shared library implementation, and a makefile might not establish that relationship. I would think you'd have to keep deleting the library .o files whenever you want to readjust the buffer size.

Am I wrong on this?

@mtz: just call Serial.begin(1000000L); This version uses exactly the same functions as the original one. It's transparent for the programmer.

@halley: I'm not sure how the makefile is written, but in my case, whenever I change the wiring_serial.c file it is automatically recompiled. It'd be awesome if one could put the

#define RX_BUFFER_SIZE yourvalue

at the beginning of the sketch file so there's no need to edit the library. With a smart code in the library:

#ifndef RX_BUFFER_SIZE
#define RX_BUFFER_SIZE 128
#endif

After this great size and speed improvement to the serial library, why not upgrade the Serial.begin() function as well?

How about

Serial.begin(BAUD,RX_BUFF_SIZE,TX_BUFF_SIZE)

The buffer sizes could be optional and default to sensible values.

Thanks mekon83...

But I have another question: in the code you put, there is a "L" after the speed, is that right?

Yep, it tells the compiler that 1000000L is a long constant, not an int. Maybe 1000000UL, unsigned long, that's even better.

@madworm: Unfortunately, this is not possible. The buffers are static arrays, compiler needs to know their size when compiling. It could be done with dynamic arrays but that introduces so many problems that the static solution is the best, in my opinion. It's simply too much to ask with 1kB SRAM.

Does this modification work with ATmega328?

Ironically I'm using code that @westfw helped develop for the HT1632 chip and am running up against the limits to which I can feed the Arduino data to push into the HT1632 chips. (currently at 115,200, I can push about 10KB/sec into the chips.)

I'd love to jump it up to 20 or 40KB/sec which would allow me to double or quadruple the number of Sure Electronics 2416 boards I can feed from a single arduino.

I'm a bit of a newbie, so I'm hesitant to overwrite my existing serial library without a little guidance. I'm using Arduino 0017