Go Down

Topic: An alternative Serial Library for Arduino 1.0 (Read 24473 times) previous topic - next topic

westfw

(the  "divide" I was complaining about exists in the 1.0 HardwareSerial implementation.)

fat16lib

sixeyes,

I was already playing with designs for flash strings. 

I plan to add:
Code: [Select]

/** Write a flash string */
size_t SerialPort::write(const __FlashStringHelper *);

/** Write a flash string and CF/LF */
size_t SerialPort::writeln(const __FlashStringHelper *);


I already found a bug in my new read(uint8_t*, size_t) function.

Line 423 of SerialPort.h should be:
Code: [Select]
      p += rxRingBuf[PortNumber].get(p, limit - p);

sixeyes


sixeyes,

I was already playing with designs for flash strings. 

I plan to add:
Code: [Select]

/** Write a flash string */
size_t SerialPort::write(const __FlashStringHelper *);

/** Write a flash string and CF/LF */
size_t SerialPort::writeln(const __FlashStringHelper *);



Excellent. Just what I need.

Quote

I already found a bug in my new read(uint8_t*, size_t) function.

Line 423 of SerialPort.h should be:
Code: [Select]
      p += rxRingBuf[PortNumber].get(p, limit - p);


Ah. Not used the read() stuff yet. Patched my code. Thanks

Iain

westfw

Quote
(the  "divide" I was complaining about exists in the 1.0 HardwareSerial implementation.)

replacing it with a conditional construct cuts execution time of the example by about 2/3 (from 9xx us to 3xx us)

fat16lib

Getting rid of the divide helped a lot but to get to the 60 microseconds that my SerialPort library uses was not easy.

About half of the 60 microseconds is consumed by two interrupts and the overhead of a micros() call.

When you enable the TX interrupt in the write function two interrupts happen immediately to fill the USART shift register and data register.

That leaves less than one microsecond per character to copy the string into the circular buffer.

westfw

Quote
When you enable the TX interrupt in the write function two interrupts happen immediately to fill the USART shift register and data register.

Interesting.  I don't know whether it's fair to count that.  You could probably eliminate it; but I'm not sure it would be worth the trouble...

Quote
not easy.

No.  Especially given the additional layering of print/stream and your requirement for a 16bit buffer length.

Have you confirmed that using memcpy() is faster than the obvious local loop?  There's not a lot that an AVR can do to make memcpy() fancy...

fat16lib

You don't want to eliminate the interrupts.  You  want them to happen as soon as possible so sending starts before returning from the send routine.  It's just an artifact in the measured time.

If you send a two character string just before the timed statement, the operation will appear to happen 20-25 microseconds faster since no interrupts happen during the measured time.

It's not worth trying to eliminate the start-up interrupt with special code.

The 16-bit buffer indices have nothing to do with the copy.  memcpy and the best local loop are about equal.

Somewhere you get to 16-bit addresses to access memory so using 8-bit indices doesn't help that much. 

Hey, so why use template parameters?  Why not just take regular constructors and then malloc the space you need?

westfw

It looks to me as though templates are a C++ construct better than C preprocessor Macros, that allow much optimization to be done based on compile-time constants.  For instance, the (simpler) Fast Digital IO templates discussed here: http://arduino.cc/forum/index.php/topic,84044.0.html look a lot cleaner than many of the other (macro based) alternatives for fast digital IO, although being a C programmer I don't know that I trust them to behave all the time...

malloc is to be avoided, if possible...

Quote
Somewhere you get to 16-bit addresses to access memory so using 8-bit indices doesn't help that much.

Long ago discussions indicated otherwise; although that was when it was just HardwareSerial.c, and the "significant" improvement is probably tiny compared to the size/speed of HardwareSerial now.
http://www.arduino.cc/cgi-bin/yabb2/YaBB.pl?num=1235799875

(hmm.  Interesting that one of the main points was that using an 8bit index allowed the modulo operation to be done with an ANDI for appropriate constants, rather than calling the division routine.)

fat16lib

#54
Jan 04, 2012, 02:03 pm Last Edit: Jan 04, 2012, 03:03 pm by fat16lib Reason: 1
Who cares about discussions long ago.  I learned this long ago from my mentor, a great Nobel prize wining physicist.

Quote
One experimentally verified fact is worth more than a thousand opinions.


Here are three ways to copy memory and the loop takes 8 cycles in all three.  This proves that the advantage of using an 8-bit index to access memory is mostly a myth.

My favorite, the first one, is the smallest.
Code: [Select]

void mem(uint8_t* dst, uint8_t* src, uint16_t n) {
  0:   fc 01           movw    r30, r24
 uint8_t* limit = src + n;
  2:   46 0f           add     r20, r22
  4:   57 1f           adc     r21, r23
  6:   db 01           movw    r26, r22
 while(src < limit) *dst++ = *src++;
  8:   00 c0           rjmp    .+0             ; 0xa <_Z3memPhS_j+0xa>
  a:   8d 91           ld      r24, X+
  c:   81 93           st      Z+, r24
  e:   a4 17           cp      r26, r20
 10:   b5 07           cpc     r27, r21
 12:   00 f0           brcs    .+0             ; 0x14 <_Z3memPhS_j+0x14>
}
 14:   08 95           ret

Using an 8-bit count doesn't help:
Code: [Select]

void mem(uint8_t* dst, uint8_t* src, uint8_t n) {
  0:   fc 01           movw    r30, r24
 uint8_t* limit = src + n;
  2:   cb 01           movw    r24, r22
  4:   84 0f           add     r24, r20
  6:   91 1d           adc     r25, r1
  8:   db 01           movw    r26, r22
 while(src < limit) *dst++ = *src++;
  a:   00 c0           rjmp    .+0             ; 0xc <_Z3memPhS_h+0xc>
  c:   2d 91           ld      r18, X+
  e:   21 93           st      Z+, r18
 10:   a8 17           cp      r26, r24
 12:   b9 07           cpc     r27, r25
 14:   00 f0           brcs    .+0             ; 0x16 <_Z3memPhS_h+0x16>
 16:   08 95           ret
}

Trying hard to take advantage of the 8-bit index makes matters worse:
Code: [Select]

void mem1(uint8_t* dst, uint8_t* src, uint8_t n) {
 for (uint8_t i = 0; i < n; i++) dst[i] = src[i];
  0:   26 2f           mov     r18, r22
  2:   37 2f           mov     r19, r23
  4:   f9 01           movw    r30, r18
  6:   28 2f           mov     r18, r24
  8:   39 2f           mov     r19, r25
  a:   d9 01           movw    r26, r18
  c:   80 e0           ldi     r24, 0x00       ; 0
  e:   00 c0           rjmp    .+0             ; 0x10 <_Z4mem1PhS_h+0x10>
 10:   91 91           ld      r25, Z+
 12:   9d 93           st      X+, r25
 14:   8f 5f           subi    r24, 0xFF       ; 255
 16:   84 17           cp      r24, r20
 18:   00 f0           brcs    .+0             ; 0x1a <_Z4mem1PhS_h+0x1a>
 1a:   08 95           ret
}


Edit:  Suppose after seeing this you try to be really clever and do two bytes per loop like this:
Code: [Select]

void mem(uint8_t* dst, uint8_t* src, uint16_t n) {
  uint8_t* limit = src + n;
  if (n & 1) *dst++ = *src++;
  while(src < limit) {
    *dst++ = *src++;
    *dst++ = *src++;
  }
}

Not a good idea after all:
Code: [Select]

void mem(uint8_t* dst, uint8_t* src, uint16_t n) {
   0:   fc 01           movw    r30, r24
   2:   db 01           movw    r26, r22
  uint8_t* limit = src + n;
   4:   cb 01           movw    r24, r22
   6:   84 0f           add     r24, r20
   8:   95 1f           adc     r25, r21
  if (n & 1) *dst++ = *src++;
   a:   40 ff           sbrs    r20, 0
   c:   00 c0           rjmp    .+0             ; 0xe <_Z3memPhS_j+0xe>
   e:   2d 91           ld      r18, X+
  10:   21 93           st      Z+, r18
  12:   00 c0           rjmp    .+0             ; 0x14 <_Z3memPhS_j+0x14>
  while(src < limit) {
    *dst++ = *src++;
  14:   2c 91           ld      r18, X
  16:   20 83           st      Z, r18
    *dst++ = *src++;
  18:   11 96           adiw    r26, 0x01       ; 1
  1a:   2c 91           ld      r18, X
  1c:   11 97           sbiw    r26, 0x01       ; 1
  1e:   21 83           std     Z+1, r18        ; 0x01
  20:   32 96           adiw    r30, 0x02       ; 2
  22:   12 96           adiw    r26, 0x02       ; 2
  24:   a8 17           cp      r26, r24
  26:   b9 07           cpc     r27, r25
  28:   00 f0           brcs    .+0             ; 0x2a <_Z3memPhS_j+0x2a>
  }
}
  2a:   08 95           ret    ret

fat16lib

sixeyes,

How about this?
Code: [Select]

#include <SerialPort.h>

SerialPort<0, 0, 32> port;

void setup(void)
{
  port.begin(115200);
  for (int route = 0; route < 11; route++)
  {
    uint32_t start;

    start = micros();
    port.writeln(F("Selecting passenger route x"));
    ShowMessageTime(start, micros());
  }
}

void loop(void)
{
}

void ShowMessageTime(uint32_t start, uint32_t stop)
{
  port.print("Message time:");
  port.print(stop - start, DEC);
  port.println("us");
  delay(400);
}

Quote

Selecting passenger route x
Message time:64us
Selecting passenger route x
Message time:76us
Selecting passenger route x
Message time:76us
Selecting passenger route x
Message time:76us
Selecting passenger route x
Message time:76us
Selecting passenger route x
Message time:72us
Selecting passenger route x
Message time:76us
Selecting passenger route x
Message time:68us
Selecting passenger route x
Message time:72us
Selecting passenger route x
Message time:64us
Selecting passenger route x
Message time:76us


sixeyes

Excellent news.

BTW any chance of fixing the signed / unsigned warning in SerialPort.cpp in the next release? I hate seeing the odd red message disappear by, not knowing if it's mine or not.

Iain

fat16lib

I am using the Arduino IDE so I don't get warnings.

I have an Eclipse Arduino environment so I will build it there to find it.

sixeyes

I'm using the Arduino v1.0 IDE.

arduino-1.0\libraries\SerialPort\SerialPort.cpp: In member function 'uint16_t SerialRingBuffer::put(uint8_t*, size_t)':
arduino-1.0\libraries\SerialPort\SerialPort.cpp:131: warning: comparison between signed and unsigned integer expressions

Iain

fat16lib

I didn't have verbose set for compile.  Set verbose and now the warnings show.

Go Up