An alternative Serial Library for Arduino 1.0

I posted a new beta, SerialPortBeta20120103.zip, here Google Code Archive - Long-term storage for Google Code Project Hosting.

Here is the changes.txt file:

3 Jan 2012

Almost total rewrite so expect bugs! Be sure to tell me about bugs.

See html for more details about the following functions.

Changed begin() to have a optional second argument to set parity,
character size, and number of stop bits.

void begin (long baud, uint8_t options = SP_8_BIT_CHAR)

The following can be ORed together for options:

Choose one stop bit option.
static const uint8_t SP_1_STOP_BIT = 0 (default)
static const uint8_t SP_2_STOP_BIT = M_USBS

Choose one character size.
static const uint8_t SP_5_BIT_CHAR = 0
static const uint8_t SP_6_BIT_CHAR = M_UCSZ0
static const uint8_t SP_7_BIT_CHAR = M_UCSZ1
static const uint8_t SP_8_BIT_CHAR = M_UCSZ0 | M_UCSZ1

Choose one parity option.
static const uint8_t SP_EVEN_PARITY = M_UPM1
static const uint8_t SP_NO_PARITY = 0
static const uint8_t SP_ODD_PARITY = M_UPM0 | M_UPM1

Added RX error checking.

The following error bits are returned:
static const uint8_t SP_FRAMING_ERROR = M_FE
static const uint8_t SP_PARITY_ERROR = M_UPE
static const uint8_t SP_RX_BUF_OVERRUN = 1
static const uint8_t SP_RX_DATA_OVERRUN = M_DOR

Added the following functions:

void clearRxError()
uint8_t getRxError()
size_t read (uint8_t *b, size_t n)
size_t write (const char *s) - overrides version in Stream
size_t write (uint8_t *b, size_t n) - overrides version in Stream
size_t writeln (const char *s)
size_t writeln ()

fat16lib:
I posted a new beta, SerialPortBeta20120103.zip, here Google Code Archive - Long-term storage for Google Code Project Hosting.

I've tried it out and it is faster when using RAM strings, but to save space most of my debug messages are coming from flash memory.
Could I request 1 additional function that takes a __FlashStringHelper as an argument can copies it directly into the ring buffer?
At present I'm copying from __FlashStringHelper to a RAM buffer and then passing on to SerialPort. Still faster than before but could be faster still.

Not seen any bugs yet.

Iain

(the "divide" I was complaining about exists in the 1.0 HardwareSerial implementation.)

sixeyes,

I was already playing with designs for flash strings.

I plan to add:

/** Write a flash string */
size_t SerialPort::write(const __FlashStringHelper *);

/** Write a flash string and CF/LF */
size_t SerialPort::writeln(const __FlashStringHelper *);

I already found a bug in my new read(uint8_t*, size_t) function.

Line 423 of SerialPort.h should be:

      p += rxRingBuf[PortNumber].get(p, limit - p);

fat16lib:
sixeyes,

I was already playing with designs for flash strings.

I plan to add:

/** Write a flash string */

size_t SerialPort::write(const __FlashStringHelper *);

/** Write a flash string and CF/LF */
size_t SerialPort::writeln(const __FlashStringHelper *);

Excellent. Just what I need.

I already found a bug in my new read(uint8_t*, size_t) function.

Line 423 of SerialPort.h should be:

      p += rxRingBuf[PortNumber].get(p, limit - p);

Ah. Not used the read() stuff yet. Patched my code. Thanks

Iain

(the "divide" I was complaining about exists in the 1.0 HardwareSerial implementation.)

replacing it with a conditional construct cuts execution time of the example by about 2/3 (from 9xx us to 3xx us)

Getting rid of the divide helped a lot but to get to the 60 microseconds that my SerialPort library uses was not easy.

About half of the 60 microseconds is consumed by two interrupts and the overhead of a micros() call.

When you enable the TX interrupt in the write function two interrupts happen immediately to fill the USART shift register and data register.

That leaves less than one microsecond per character to copy the string into the circular buffer.

When you enable the TX interrupt in the write function two interrupts happen immediately to fill the USART shift register and data register.

Interesting. I don't know whether it's fair to count that. You could probably eliminate it; but I'm not sure it would be worth the trouble...

not easy.

No. Especially given the additional layering of print/stream and your requirement for a 16bit buffer length.

Have you confirmed that using memcpy() is faster than the obvious local loop? There's not a lot that an AVR can do to make memcpy() fancy...

You don't want to eliminate the interrupts. You want them to happen as soon as possible so sending starts before returning from the send routine. It's just an artifact in the measured time.

If you send a two character string just before the timed statement, the operation will appear to happen 20-25 microseconds faster since no interrupts happen during the measured time.

It's not worth trying to eliminate the start-up interrupt with special code.

The 16-bit buffer indices have nothing to do with the copy. memcpy and the best local loop are about equal.

Somewhere you get to 16-bit addresses to access memory so using 8-bit indices doesn't help that much.

Hey, so why use template parameters? Why not just take regular constructors and then malloc the space you need?

It looks to me as though templates are a C++ construct better than C preprocessor Macros, that allow much optimization to be done based on compile-time constants. For instance, the (simpler) Fast Digital IO templates discussed here: http://arduino.cc/forum/index.php/topic,84044.0.html look a lot cleaner than many of the other (macro based) alternatives for fast digital IO, although being a C programmer I don't know that I trust them to behave all the time...

malloc is to be avoided, if possible...

Somewhere you get to 16-bit addresses to access memory so using 8-bit indices doesn't help that much.

Long ago discussions indicated otherwise; although that was when it was just HardwareSerial.c, and the "significant" improvement is probably tiny compared to the size/speed of HardwareSerial now.
http://www.arduino.cc/cgi-bin/yabb2/YaBB.pl?num=1235799875

(hmm. Interesting that one of the main points was that using an 8bit index allowed the modulo operation to be done with an ANDI for appropriate constants, rather than calling the division routine.)

Who cares about discussions long ago. I learned this long ago from my mentor, a great Nobel prize wining physicist.

One experimentally verified fact is worth more than a thousand opinions.

Here are three ways to copy memory and the loop takes 8 cycles in all three. This proves that the advantage of using an 8-bit index to access memory is mostly a myth.

My favorite, the first one, is the smallest.

void mem(uint8_t* dst, uint8_t* src, uint16_t n) {
   0:   fc 01           movw    r30, r24
  uint8_t* limit = src + n;
   2:   46 0f           add     r20, r22
   4:   57 1f           adc     r21, r23
   6:   db 01           movw    r26, r22
  while(src < limit) *dst++ = *src++;
   8:   00 c0           rjmp    .+0             ; 0xa <_Z3memPhS_j+0xa>
   a:   8d 91           ld      r24, X+
   c:   81 93           st      Z+, r24
   e:   a4 17           cp      r26, r20
  10:   b5 07           cpc     r27, r21
  12:   00 f0           brcs    .+0             ; 0x14 <_Z3memPhS_j+0x14>
}
  14:   08 95           ret

Using an 8-bit count doesn't help:

void mem(uint8_t* dst, uint8_t* src, uint8_t n) {
   0:   fc 01           movw    r30, r24
  uint8_t* limit = src + n;
   2:   cb 01           movw    r24, r22
   4:   84 0f           add     r24, r20
   6:   91 1d           adc     r25, r1
   8:   db 01           movw    r26, r22
  while(src < limit) *dst++ = *src++;
   a:   00 c0           rjmp    .+0             ; 0xc <_Z3memPhS_h+0xc>
   c:   2d 91           ld      r18, X+
   e:   21 93           st      Z+, r18
  10:   a8 17           cp      r26, r24
  12:   b9 07           cpc     r27, r25
  14:   00 f0           brcs    .+0             ; 0x16 <_Z3memPhS_h+0x16>
  16:   08 95           ret
}

Trying hard to take advantage of the 8-bit index makes matters worse:

void mem1(uint8_t* dst, uint8_t* src, uint8_t n) {
  for (uint8_t i = 0; i < n; i++) dst[i] = src[i];
   0:   26 2f           mov     r18, r22
   2:   37 2f           mov     r19, r23
   4:   f9 01           movw    r30, r18
   6:   28 2f           mov     r18, r24
   8:   39 2f           mov     r19, r25
   a:   d9 01           movw    r26, r18
   c:   80 e0           ldi     r24, 0x00       ; 0
   e:   00 c0           rjmp    .+0             ; 0x10 <_Z4mem1PhS_h+0x10>
  10:   91 91           ld      r25, Z+
  12:   9d 93           st      X+, r25
  14:   8f 5f           subi    r24, 0xFF       ; 255
  16:   84 17           cp      r24, r20
  18:   00 f0           brcs    .+0             ; 0x1a <_Z4mem1PhS_h+0x1a>
  1a:   08 95           ret
}

Edit: Suppose after seeing this you try to be really clever and do two bytes per loop like this:

void mem(uint8_t* dst, uint8_t* src, uint16_t n) {
  uint8_t* limit = src + n;
  if (n & 1) *dst++ = *src++;
  while(src < limit) {
    *dst++ = *src++;
    *dst++ = *src++;
  }
}

Not a good idea after all:

void mem(uint8_t* dst, uint8_t* src, uint16_t n) {
   0:   fc 01           movw    r30, r24
   2:   db 01           movw    r26, r22
  uint8_t* limit = src + n;
   4:   cb 01           movw    r24, r22
   6:   84 0f           add     r24, r20
   8:   95 1f           adc     r25, r21
  if (n & 1) *dst++ = *src++;
   a:   40 ff           sbrs    r20, 0
   c:   00 c0           rjmp    .+0             ; 0xe <_Z3memPhS_j+0xe>
   e:   2d 91           ld      r18, X+
  10:   21 93           st      Z+, r18
  12:   00 c0           rjmp    .+0             ; 0x14 <_Z3memPhS_j+0x14>
  while(src < limit) {
    *dst++ = *src++;
  14:   2c 91           ld      r18, X
  16:   20 83           st      Z, r18
    *dst++ = *src++;
  18:   11 96           adiw    r26, 0x01       ; 1
  1a:   2c 91           ld      r18, X
  1c:   11 97           sbiw    r26, 0x01       ; 1
  1e:   21 83           std     Z+1, r18        ; 0x01
  20:   32 96           adiw    r30, 0x02       ; 2
  22:   12 96           adiw    r26, 0x02       ; 2
  24:   a8 17           cp      r26, r24
  26:   b9 07           cpc     r27, r25
  28:   00 f0           brcs    .+0             ; 0x2a <_Z3memPhS_j+0x2a>
  }
}
  2a:   08 95           ret    ret

sixeyes,

How about this?

#include <SerialPort.h>

SerialPort<0, 0, 32> port;

void setup(void)
{
  port.begin(115200);
  for (int route = 0; route < 11; route++)
  {
    uint32_t start;

    start = micros();
    port.writeln(F("Selecting passenger route x"));
    ShowMessageTime(start, micros());
  }
}

void loop(void)
{
}

void ShowMessageTime(uint32_t start, uint32_t stop)
{
  port.print("Message time:");
  port.print(stop - start, DEC);
  port.println("us");
  delay(400);
}

Selecting passenger route x
Message time:64us
Selecting passenger route x
Message time:76us
Selecting passenger route x
Message time:76us
Selecting passenger route x
Message time:76us
Selecting passenger route x
Message time:76us
Selecting passenger route x
Message time:72us
Selecting passenger route x
Message time:76us
Selecting passenger route x
Message time:68us
Selecting passenger route x
Message time:72us
Selecting passenger route x
Message time:64us
Selecting passenger route x
Message time:76us

Excellent news.

BTW any chance of fixing the signed / unsigned warning in SerialPort.cpp in the next release? I hate seeing the odd red message disappear by, not knowing if it's mine or not.

Iain

I am using the Arduino IDE so I don't get warnings.

I have an Eclipse Arduino environment so I will build it there to find it.

I'm using the Arduino v1.0 IDE.

arduino-1.0\libraries\SerialPort\SerialPort.cpp: In member function 'uint16_t SerialRingBuffer::put(uint8_t*, size_t)':
arduino-1.0\libraries\SerialPort\SerialPort.cpp:131: warning: comparison between signed and unsigned integer expressions

Iain

I didn't have verbose set for compile. Set verbose and now the warnings show.

Ah. They must have changed the default. Did a fresh install when I downloaded v1.0.

Iain

The version with flash strings is posted as SerialPortBeta20120104.zip Google Code Archive - Long-term storage for Google Code Project Hosting..

Thanks great. Working fine here.

Just left to sort out my own warning message now :slight_smile:

Iain