Serial write time

Hi,
I am trying to reduce the time taken by the Serial.write(...) function. Theoretically, the time taken by this function should be 9 / baudrate to send one byte. I used the following code to verify that :

void setup(){
  Serial.begin(115200);
  Serial2.begin(300);
}

unsigned long t;
int i=0;

void loop(){
  if(i==0){
    t=micros(); // line A
    Serial2.write(byte(5));
    Serial2.flush(); // line B
    t=micros()-t; // line C
    delay(50);
    Serial.println(t);
    i++;
  }
  else{
    if(i==1){
      t=micros();
      i++;
    }
    else{
      if(micros()-t > 1000000){
        i=0;
      }
    }
  }
}

This code sends a byte every 1 sec and prints the time taken between line A and C. The problem is that when I comment line B, the time taken by the instruction "Serial2.write(byte(5));" is always 20µs, for every baudrate... When I uncomment line B, the time taken is approximately 20 + 9/baudrate µs for every beaudrate.
In my opinion, it is because the first line is only copying the byte to the write buffer, and the second is actually waiting for it to be sent.

  1. Is my assumption right ?

  2. Why does it take so long to copy a byte (20µs !) ? I want to make it quicker because this constant time is actually a problem to send data at high speed (I would like to send data as quickly as possible and I am using a baudrate of 460800 so I loose half of the time by copying in the buffer)

Any help appreciated :~

Theoretically, the time taken by this function should be 9 / baudrate to send one byte.

Start + stop + eight data = 10.

Oops didn't think there was a start bit, thank you. However there is still this 20µs issue :.

The serial system consists of transmit and receive buffers. By default they are (I think) 64 bytes in size each.

When you do a Serial.write it places the character into the buffer, and then an interrupt routine sends the contents of the buffer out as and when it can.

If you fill the buffer Serial.write waits until the interrupt routine has made enough space in the buffer for you.

To test the serial transmission times, you need to flood the buffer until it fills then time how long it takes before you are able to write another character into the buffer.

When you do a Serial.write it places the character into the buffer, and then an interrupt routine sends the contents of the buffer out as and when it can.

The Serial.flush() method blocks, waiting for the buffer to be empty. THAT is where your time is being wasted. Keep doing other things. Serial.write() will block, on it's own, when the buffer is full.

Also, micros() is only approximate anyway.

At 16MHz you have a 1/16000000 instruction clock. That's one clock every 63ns. 20µS is just 320 clocks. Some of those are used for getting the micros(), some by the serial ISR, etc.

Serial::write is quite a heavy-weight function when seen from the PoV of ASM:

00000562 <_ZN14HardwareSerial5writeEh>:

size_t HardwareSerial::write(uint8_t c)
 562:   cf 93           push    r28
 564:   df 93           push    r29
 566:   ec 01           movw    r28, r24
{
  int i = (_tx_buffer->head + 1) % SERIAL_BUFFER_SIZE;
 568:   ee 85           ldd r30, Y+14   ; 0x0e
 56a:   ff 85           ldd r31, Y+15   ; 0x0f
 56c:   e0 5c           subi    r30, 0xC0   ; 192
 56e:   ff 4f           sbci    r31, 0xFF   ; 255
 570:   20 81           ld  r18, Z
 572:   31 81           ldd r19, Z+1    ; 0x01
 574:   e0 54           subi    r30, 0x40   ; 64
 576:   f0 40           sbci    r31, 0x00   ; 0
 578:   2f 5f           subi    r18, 0xFF   ; 255
 57a:   3f 4f           sbci    r19, 0xFF   ; 255
 57c:   2f 73           andi    r18, 0x3F   ; 63
 57e:   30 70           andi    r19, 0x00   ; 0
   
  // If the output buffer is full, there's nothing for it other than to
  // wait for the interrupt handler to empty it a bit
  // ???: return 0 here instead?
  while (i == _tx_buffer->tail)
 580:   df 01           movw    r26, r30
 582:   ae 5b           subi    r26, 0xBE   ; 190
 584:   bf 4f           sbci    r27, 0xFF   ; 255
 586:   8d 91           ld  r24, X+
 588:   9c 91           ld  r25, X
 58a:   11 97           sbiw    r26, 0x01   ; 1
 58c:   28 17           cp  r18, r24
 58e:   39 07           cpc r19, r25
 590:   d1 f3           breq    .-12        ; 0x586 <_ZN14HardwareSerial5writeEh+0x24>
    ;
   
  _tx_buffer->buffer[_tx_buffer->head] = c;
 592:   e0 5c           subi    r30, 0xC0   ; 192
 594:   ff 4f           sbci    r31, 0xFF   ; 255
 596:   80 81           ld  r24, Z
 598:   91 81           ldd r25, Z+1    ; 0x01
 59a:   e0 54           subi    r30, 0x40   ; 64
 59c:   f0 40           sbci    r31, 0x00   ; 0
 59e:   e8 0f           add r30, r24
 5a0:   f9 1f           adc r31, r25
 5a2:   60 83           st  Z, r22

  _tx_buffer->head = i;
 5a4:   ee 85           ldd r30, Y+14   ; 0x0e
 5a6:   ff 85           ldd r31, Y+15   ; 0x0f
 5a8:   e0 5c           subi    r30, 0xC0   ; 192
 5aa:   ff 4f           sbci    r31, 0xFF   ; 255
 5ac:   31 83           std Z+1, r19    ; 0x01
 5ae:   20 83           st  Z, r18

  sbi(*_ucsrb, _udrie);
 5b0:   ee 89           ldd r30, Y+22   ; 0x16
 5b2:   ff 89           ldd r31, Y+23   ; 0x17
 5b4:   20 81           ld  r18, Z
 5b6:   81 e0           ldi r24, 0x01   ; 1
 5b8:   90 e0           ldi r25, 0x00   ; 0
 5ba:   0f 8c           ldd r0, Y+31    ; 0x1f
 5bc:   02 c0           rjmp    .+4         ; 0x5c2 <_ZN14HardwareSerial5writeEh+0x60>
 5be:   88 0f           add r24, r24
 5c0:   99 1f           adc r25, r25
 5c2:   0a 94           dec r0
 5c4:   e2 f7           brpl    .-8         ; 0x5be <_ZN14HardwareSerial5writeEh+0x5c>
 5c6:   28 2b           or  r18, r24
 5c8:   20 83           st  Z, r18
  // clear the TXC bit -- "can be cleared by writing a one to its bit location"
  transmitting = true;
 5ca:   81 e0           ldi r24, 0x01   ; 1
 5cc:   89 a3           std Y+33, r24   ; 0x21
  sbi(*_ucsra, TXC0);
 5ce:   ec 89           ldd r30, Y+20   ; 0x14
 5d0:   fd 89           ldd r31, Y+21   ; 0x15
 5d2:   80 81           ld  r24, Z
 5d4:   80 64           ori r24, 0x40   ; 64
 5d6:   80 83           st  Z, r24

  return 1;
}
 5d8:   81 e0           ldi r24, 0x01   ; 1
 5da:   90 e0           ldi r25, 0x00   ; 0
 5dc:   df 91           pop r29
 5de:   cf 91           pop r28
 5e0:   08 95           ret

That's 64 instructions, each consisting of 2 bytes, so 128 clock clock cycles (assuming each byte is one clock cycle - it may not be), that's 128*63 = ~8µS - and that's just the Serial.write(). Add to that the Serial.flush() and micros() calls and assignments, plus the overhead of the ISR that will become active the moment you do the Serial.write(), and you can easily fill 20µS.

The Serial.flush() method blocks, waiting for the buffer to be empty. THAT is where your time is being wasted.

Yes I know, maybe I wasn't clear but I commented this line to get the 20µs. I only added this line after that to measure the "true" transmit time depending on the baudrate. I'm not going to use Serial.flush() in my code. :wink:

I guess majenko's last answer is explaining why there are 20µs of constant time (even if Serial.flush() is commented). So I guess the "real" writing time taken by the Arduino will always be at least 20µs, whatever baudrate you choose... :~

I just filled the buffer to measure the time taken by Serial.write() on its own with the buffer filled :

  • BR : 115200 , t = 85µs
  • BR : 460800, t = 20µs
  • BR : 921600, t= 18.5µs (after that, it seems to be stuck at 18.5µs)

As expected, there will always be this constant time (looks like 18.5 was rounded up to 20, I got this precision by doing multiple Serial.write(...)), I will have to do with it I guess... :~
Thank you a lot for your answers

PS : my code for the last experiment

void setup(){
  Serial.begin(115200);
  Serial2.begin(115200);//change this value
}

unsigned long t;
int i=0;

void loop(){
  if(i==0){
    for (int k=0;k<1000; k++){
      Serial2.write(byte(1));
    }
    t=micros();
    Serial2.write(byte(5));
    Serial2.write(byte(5));
    Serial2.write(byte(5));
    Serial2.write(byte(5));
    Serial2.write(byte(5));
    Serial2.write(byte(5));
    Serial2.write(byte(5));
    Serial2.write(byte(5));
    t=micros()-t;
    delay(50);
    Serial.println(t);
    i++;
  }
  else{
    if(i==1){
      t=micros();
      i++;
    }
    else{
      if(micros()-t > 1000000){
        i=0;
      }
    }
  }
}

So I guess the "real" writing time taken by the Arduino will always be at least 20µs, whatever baudrate you choose...

Sure, because writing to the buffer has nothing to do with sending data to the serial port, where timing between the pulses (aka baud rate) matters.

I thought that time would be much lower.. Any hack you know to improve it?

Any hack you know to improve it?

The general solution is to not send novels via the serial port. Enough with the handwaving. What are you trying to do, and why do you have to send so much data?

I have a device which must send data each time a rotary encoder senses a rotation. The piece of data sent each time is 8 bytes long, and the movement of the rotary encoder makes it so that you have only 333µs before another piece of data must be sent...
Since I need 20*8=160µs to send the data, I have only 173 µs left to do all the other stuff (reading multiple sensors, reading serial incoming data to see if an order is received, ...).
It works for the moment but maybe I will have to add a sensor and send 9 bytes of data so I will have a problem... It is impossible to pause or slow the movement to empty the transmit buffer, and I have already optimized a lot everything else (even each bit of data sent).
I was only wondering if there was a possibility to cut some time in the sending since it is the only part which is taking so much time (other parts are all taking around 10-20µs).

Since I need 20*8=160µs to send the data

Have you confirmed that the overhead of writing 8 bytes to the buffer in one call is really 8 times the overhead of writing one byte per call? Are you actually writing 8 bytes one at a time?

Arduino micros() rounds up to 4. Not 20 or some changing guessy value but 4.
It has to do with how long it takes just to get the count, ATMEL does give a full explanation in their docs.

173 usecs left for other parts that take 10 to 20, I guess that you have no analog reads then.

You might consider spreading the work over more than 1 AVR or moving up to an ARM.

AVR's are cheap and you can program most with an Arduino as ISP. Connect them with SPI bus for fast comms.
Or maybe if you are up to hacking your PC, find out how to use the SPI port it has for other than flashing updates.

The Due is an ARM as are some other Arduino-compatibles. They generally run 48-96 MHz just for starts.

PaulS:

Since I need 20*8=160µs to send the data

Have you confirmed that the overhead of writing 8 bytes to the buffer in one call is really 8 times the overhead of writing one byte per call? Are you actually writing 8 bytes one at a time?

I tried both and I have the same results.

Arduino micros() rounds up to 4. Not 20 or some changing guessy value but 4.

Yes, I know that, the 20µs is not about rounding up, this is the time taken by the Serial.write(...) function (to copy the byte to the transmit buffer, ...) before even beginning to really transmit the byte.

173 usecs left for other parts that take 10 to 20, I guess that you have no analog reads then.

Actually I have one, but it is already optimized (fast analogread), it is one of the optimization that I have already made (it takes 20µs).

The Due is an ARM as are some other Arduino-compatibles.

You're picking my interest. I had to choose between Mega and Due at a time and I chose Mega because of the 5v pins. If I buy logic converters, and all the things I should need to remove the 3.3V problem, will there be other issues with the Due (for both software/hardware) ? If I keep the same code, I think all calculations should go more than 5 times faster, so all code execution will be 5 times faster (except transmit time which depends on baudrate), is that true?

Having not run much in the way of tests I can't say that I know.
My ARM board is a Teensy 3.1 I've run at 96MHz. It can handle 5V inputs but is 3.3V. It's also got 64K RAM.
OTOH it's very small and needs pins soldered in plus a bunch of the contacts are underside pads.
If you're not happy with close-in, small soldering then a Due or other ARM-duino may suit you better.

Consider that the Due has not only a higher clock rate but is a 32 bit processor.
6x the clock plus 4x the word length. It can do at least some 32-bit ops in 1 cycle, not so with AVR's.
You might want to check the ARM processor for pipeline(s) and better serial hardware.
I call them Frankenduinos for a reason. Compared to AVR's they are "Monster".

The code is the same though the pins and some other hardware details may differ. Pin 13 is still the led pin on mine.
I use the same IDE with a different target board but keep in mind the hardware differences. BWD is no problem. :grin:

You can build logic level converters from resistors but I've seen a diode trick that saves waste current. Solder the components right into the wires you connect with, just be sure to heat-shrink cover that bare spots.

PS:
There is also the more AVR's route once you know how to program those. A 328P runs about $2-$3 last time I bought over a year ago and I don't expect that the price has gone up. Down, if anything.
If you need a link, sing out.

It's been so long I'd forgotten but I just checked, the Teensy's communicate at USB speed,

Unlike a standard Arduino, the Teensy Serial object always communicates at 12 Mbit/sec USB speed. Many computes, especially older Macs, can not update the serial monitor window if there is no delay to limit the speed!

http://www.pjrc.com/teensy/td_serial.html

Teensy's are compatibles and that does require some different steps from standard Arduino.
http://www.pjrc.com/teensy/index.html
http://www.pjrc.com/teensy/teensyduino.html

Since the Teensy 2.0, the Arduino Leonardo with the same chip and capabilities and standard form has come out but I dunno about serial speed on that except there's no reason it shouldn't be high.

PPS:
Here's a gem from the Teensy site on serial....

On a standard Arduino, when you transmit with Serial.print(), the bytes are transmitted slowly by the on-chip UART to a FTDI USB-serial converter chip. The UART buffers 2 bytes, so Serial.print() will return when all but the last 2 bytes have been sent to the FTDI converter chip, which in turn stores the bytes into its own USB buffer.

On a Teensy, Serial.print() writes directly into the USB buffer. If your entire message fits within the buffer, Serial.print() returns to your sketch very quickly.

Perhaps if you only send 2 bytes at a time from your own buffer (char array, not String Object) to serial on a timed basis then you could get everything to run on the board you've already got.

How long it takes to execute serial.write() is only relevant if it is getting in the way of other stuff the Arduino should be doing (in which case serial.flush() just adds to the problem).

If the time it takes to send data to the PC is what matters a better measure is to send a bunch of stuff with multiple repeats (perhaps 1000), measure the start and end time and get the average.

In my experience it can communicate pretty close to the expected throughput for the chosen baud rate.

...R

How long it takes to execute serial.write() is only relevant if it is getting in the way of other stuff the Arduino should be doing

That's exactly what's going on here :stuck_out_tongue:

Thank you GoForSmoke for all these informations. The more AVRs route seems to need a little more work and unfortunately I can't afford it right now, but I will keep it in mind for other projects, it seems very interesting. Between the Teensy and the Due, I think I will finally prefer the Due, the only issue for me is doing something that works and which is as simple as possible whatever the price is so I think it is better for me to stick with Arduino products, even if the Teensy is totally compatible, it will save me some time when I will have to convert my MEGA sketch for the DUE. But again, I'll keep it in mind for other projects !
About the Serial speed, I don't think I will have USB speed because I'm actually using the Serial to send data through a Bluetooth module. However the bytes copying in Serial.write function (and all over functions) will be faster because of the clock, so this solution should work for me !

yukikami:

How long it takes to execute serial.write() is only relevant if it is getting in the way of other stuff the Arduino should be doing

That's exactly what's going on here

You haven't given any indication of what your project does or how the speed of serial.write() interferes with it.

I suspect there may be another solution if we know what you are trying to achieve.

...R

I'm not convinced that you really need more processor but I'm not you doing the project. XD
Popping 2 chars at a time to Serial would solve your timing problems. It would take as long to get the message out but your processor could be doing other tasks much more of the time than sending 8 bytes at once.

Setting up an SPI bus is not hard thanks to the libraries (SPI and either SD or SDFat).
There's 6 wires plus chip select which you can get creative with to give up to 1MB/sec to many devices.

It's kind of a shame that the Bluetooth doesn't have SPI. But then everything should be Easy! :grin:

This is how my sketch works :

  • I have a buffer with all the 8 bytes blocks of data that I need to send
  • in the loop, I empty this buffer and when it is empty, I just continuously update the values measured by 2 sensors
  • on the other hand, I have 3 rotary encoders which all have their own interrupts, they all just increment an int, but in only one of them, I also add an 8 bytes piece of data (which represents all the current states of the sensors + encoders) in the buffer

Maybe you're right about me being wrong the way my system works but that's the best way I found... :zipper_mouth_face:

Setting up an SPI bus is not hard thanks to the libraries

I agree but I guess having another AVR will be somehow complicated, like some kind of multi-threading ?