Go Down

Topic: An alternative Serial Library for Arduino 1.0 (Read 24378 times) previous topic - next topic

sixeyes

Ah. They must have changed the default. Did a fresh install when I downloaded v1.0.

Iain

fat16lib

The version with flash strings is posted as SerialPortBeta20120104.zip http://code.google.com/p/beta-lib/downloads/list.

sixeyes

Thanks great. Working fine here.

Just left to sort out my own warning message now :)

Iain

westfw

Most of the "savings" of using an 8bit index is in the preliminary math BEFORE the copy (after all, the copy itself used to only be one byte.)
Code: [Select]
00000036 <memx_o>:
uint8_t buffer[BUFSIZE];
uint16_t bufd16;
uint8_t bufd8;

void memx_o(uint8_t *src, uint16_t n)
{
  36: dc 01        movw r26, r24
    uint8_t *limit = src + n;
  38: 68 0f        add r22, r24
  3a: 79 1f        adc r23, r25
    uint8_t *dst = &buffer[++bufd16 % BUFSIZE];
  3c: 80 91 00 00 lds r24, 0x0000
  40: 90 91 00 00 lds r25, 0x0000
  44: 01 96        adiw r24, 0x01 ; 1
  46: 90 93 00 00 sts 0x0000, r25
  4a: 80 93 00 00 sts 0x0000, r24
  4e: fc 01        movw r30, r24
  50: ef 73        andi r30, 0x3F ; 63
  52: f0 70        andi r31, 0x00 ; 0
  54: e0 50        subi r30, 0x00 ; 0
  56: f0 40        sbci r31, 0x00 ; 0
  58: 00 c0        rjmp .+0      ; 0x5a <memx_o+0x24>
    while (src < limit) {
*dst++ = *src++;
  5a: 8d 91        ld r24, X+
  5c: 81 93        st Z+, r24
  5e: a6 17        cp r26, r22
  60: b7 07        cpc r27, r23
  62: 00 f0        brcs .+0      ; 0x64 <memx_o+0x2e>
*dst++ = *src++;
    }
}
  64: 08 95        ret

vs
Code: [Select]

00000066 <memx>:

uint8_t bufs;
void memx(uint8_t *src, uint8_t n)
{
  66: dc 01        movw r26, r24
    uint8_t i = ++bufd8 % BUFSIZE;
  68: 80 91 00 00 lds r24, 0x0000
  6c: 8f 5f        subi r24, 0xFF ; 255
  6e: 80 93 00 00 sts 0x0000, r24
    uint8_t *dst = &buffer[i];
  72: 8f 73        andi r24, 0x3F ; 63
  74: e8 2f        mov r30, r24
  76: f0 e0        ldi r31, 0x00 ; 0
  78: e0 50        subi r30, 0x00 ; 0
  7a: f0 40        sbci r31, 0x00 ; 0
    do {
*dst++ = *src++;
  7c: 8d 91        ld r24, X+
  7e: 81 93        st Z+, r24
    } while (--n);
  80: 61 50        subi r22, 0x01 ; 1
  82: 01 f4        brne .+0      ; 0x84 <memx+0x1e>
}
  84: 08 95        ret

fat16lib

#64
Jan 05, 2012, 03:29 pm Last Edit: Jan 05, 2012, 03:52 pm by fat16lib Reason: 1
westfw,

I rest my case.  There is very little to be saved by limiting buffers to power-of-two sizes and a max size of 256 bytes, which is a capacity of 255 bytes for ring buffers.

I am not sure what your new examples have to do with ring buffers.

How do I use a function that copies n bytes into a location that starts at address buffer[(bufd8 + 1) % BUFFERSIZE] and increments bufd8 by one?

It also has the feature of shredding memory if n == 0. Copy functions should copy no bytes when called with n == 0.

The 16-bit version is strange also.  Most of the "setup overhead" is fetching bufd16, incrementing it and storing it.

Try this:
Code: [Select]

#define BUFSIZE 20
uint8_t buffer[BUFSIZE] = "##";
uint8_t bufd8 = 0;  // not really needed

void memx(uint8_t *src, uint8_t n)
{
 uint8_t i = ++bufd8 % BUFSIZE;
 uint8_t *dst = &buffer[i];
 do {
   *dst++ = *src++;
 } while (--n);
}

void setup() {
 char msg[] = "123456789";  // ten bytes to copy
 Serial.begin(9600);
 
 memx((uint8_t*)msg, 10);
 Serial.println((char*)buffer);
 
 memx((uint8_t*)msg, 10);  
 Serial.println((char*)buffer);
 
 // Wait a second before dying
 delay(1000);
 
 // shred memory
 memx((uint8_t*)msg, 0);

 Serial.println("You May not see this");
}
void loop() {}


Edit:  Is BUFSIZE limited to powers-of-two?

fat16lib

#65
Jan 06, 2012, 01:00 am Last Edit: Jan 06, 2012, 01:17 am by fat16lib Reason: 1
I decided to offer 8-bit buffer indices as an option, the customer is always right.

I defined a typedef for buffer indices so only one line needs to be change to select 8-bit or 16-bit indices.
Code: [Select]

typedef uint8_t buf_size_t;

or
Code: [Select]

typedef uint16_t buf_size_t;

The result is the BufferedTest example takes 1838 bytes of flash with 16-bit indices and 1642 bytes with 8-bit indices.

The WriteFlash test takes about 70 us to copy its string with 16-bit indices and 64 us with 8-bit indices.

So 8-bit indices result in about 200 byte less flash and are 10% faster.

I will make buf_size_t the fourth argument of the template with default value uint8_t.  If RX or TX buffer size is too large I will cause a compile time error with a buffer size too large message.  The user can then specify uint16_t as the fourth argument of the tremplate.

westfw

I suppose that the most controversial issue at this point is whether the users should be exposed to template syntax.

How about:
Code: [Select]
#define USE_SERIAL SerialPort<0, 64, 64> NewSerial
#define USE_SERIAL1 SerialPort<1, 64, 64> NewSerial1
#define USE_SERIAL2 SerialPort<2, 64, 64> NewSerial2
#define USE_SERIAL3 SerialPort<3, 64, 64> NewSerial3

A hypothetical wholescale replacement of HardwareSerial.h could have:
Code: [Select]
#if !defined(__SerialPortAlreadyLoaded__)
#include <SerialPort.h>
USE_SERIAL;
#if defined(UBRR1H)
USE_SERIAL1;
#endif
#if defined(UBRR2H)
USE_SERIAL2;
#endif
#if defined(UBRR3H)
USE_SERIAL3;
#endif
#endif

and wind up approximately where the defaults are today.
if the user had already set up a different SerialPort structure, that would be used instead.

Is there something that guarantees that the static arrays used for the usartRegister class don't end up in your actual image?  I noticed a while ago that uart registers had a fixed layout and common constants, and set them up as an overlaid structure, with similar savings in execution time.  (and special cases need for ATmega8)  This looks more general; I'm still looking for a downside...

Do you have a tutorial on templates that you like?  When do they usually show up, in the course of learning C++?

fat16lib

#67
Jan 06, 2012, 02:28 pm Last Edit: Jan 06, 2012, 03:38 pm by fat16lib Reason: 1
westfw,

I like your idea of a simple way for users to avoid templates and will include it.

The only way to avoid having the static arrays appear in the code is to use constant indices.  You can use the GCC compile time check.
Code: [Select]
__builtin_constant_p(index)
I didn't do this.  I just went back to using a struct and a const array with the addresses.
Code: [Select]
struct UsartRegister {
 volatile uint8_t* ucsra;  /**< USART Control and Status Register A */
 volatile uint8_t* ucsrb;  /**< USART Control and Status Register B */
 volatile uint8_t* ucsrc;  /**< USART Control and Status Register C */
 volatile uint8_t* ubrrl;  /**< USART Baud Rate Register Low */
 volatile uint8_t* ubrrh;  /**< USART Baud Rate Register High */
 volatile uint8_t* udr;    /**< USART I/O Data Register */
};
static const UsartRegister usart[] = {// initialization follows

If the array appears in the image, they are in RAM so it is not too subtle.

There are lots of pitfalls with templates but they are the heart of the C++ standard libraries.  Standard C++ IOStreams are implemented entirely in templates.

Edit: The three books below are hard going.  You should master this first:
Programming: Principles and Practice Using C++ by Bjarne Stroustrup Paperback $47.59.

For use of templates and C++ design in general I recommend three books but you need to wait for a new edition of one of these.

First for general C++ design:

Modern C++ Design: Generic Programming and Design Patterns Applied by Andrei Alexandrescu Paperback $41.36.  This book was published in 2001 so I expect a newer version. Templates appear in chapter 2.

For advice on how to use C++ features:

Effective C++: 55 Specific Ways to Improve Your Programs and Designs (3rd Edition) by Scott Meyers Paperback $37.30.  This edition was published in 2005.

Finally, the book that that will be published soon:
C++ Templates [Hardcover] David Vandevoorde, Nicolai Josuttis http://www.amazon.com/C-Templates-David-Vandevoorde/dp/0321714121/ref=dp_ob_title_bk.

There is an old edition:
C++ Templates: The Complete Guide by David Vandevoorde, Nicolai Josuttis Hardcover $52.73.  It was published in 2002.





fat16lib

I have posted a version with the option for 8-bit or 16-bit buffer indices as SerialPortBeta20120106.zip here http://code.google.com/p/beta-lib/downloads/list.

The default is to use 16-bit indices.  To use 8-bit indices, edit SerialPort.h and change
Code: [Select]
#define ALLOW_LARGE_BUFFERS 1
to
Code: [Select]
#define ALLOW_LARGE_BUFFERS 0

I also enable faster overrides for write(const char*) and write(const uint8_t*, size_t).  With these options SerialPort is still smaller than HardwareSerial for most applications.

My test example had the following Sketch Sizes.

Arduino 1.0: 6286 bytes

SerialPort as released: 6062 bytes

SerialPort with 8-bit indices and disabled faster overrides: 5776 bytes

I have also included a suggestion by westfw to hide templates.

Here is the new HelloWorld example with westfw's suggestion:
Code: [Select]

// Simple usage with buffering like Arduino 1.0
#include <SerialPort.h>

// use NewSerial for port 0
USE_NEW_SERIAL;

void setup() {
  NewSerial.begin(9600);
  NewSerial.println("Hello World!");
}
void loop() {}


westfw

It looks pretty good; it's been a while since I've looked at generated assembly and haven't groaned :-)

It looks as though there is some code boating when multiple serial ports are used on a MEGA; presumably the templates generate duplicate code with constants rather than more generic (and slower?) functions.  That's probably OK given that devices with extra serial ports have extra codespace as well, and it's not TOO awful.  A trivial example with all four ports is 4148 bytes in a (slightly fixed) HardwareSerial, and 4598 bytes using the NewSerial templates.

sixeyes


It looks pretty good; it's been a while since I've looked at generated assembly and haven't groaned :-)

It looks as though there is some code boating when multiple serial ports are used on a MEGA; presumably the templates generate duplicate code with constants rather than more generic (and slower?) functions.  That's probably OK given that devices with extra serial ports have extra codespace as well, and it's not TOO awful.  A trivial example with all four ports is 4148 bytes in a (slightly fixed) HardwareSerial, and 4598 bytes using the NewSerial templates.



But the bonus of using the new library is you don't automatically lose RAM to all four serial ports on the Mega. If you're only using two ports you get a lot of RAM back. AND it's a LOT faster than HardwareSerial.

Iain

westfw

This is a bit weird, though it seems a gcc thing:
Code: [Select]
  buf_size_t t = tail_;
2da: 14 96        adiw X, 0x04 ; 4
2dc: 2d 91        ld r18, X+
2de: 3c 91        ld r19, X
2e0: 15 97        sbiw X, 0x05 ; 5
   if (head_ == t) return false;
2e2: 12 96        adiw X, 0x02 ; 2
2e4: 8d 91        ld r24, X+
2e6: 9c 91        ld r25, X
2e8: 13 97        sbiw X, 0x03 ;

this sort of construct is used all over SerialRingBuffer;  It looks like X is used as an index to the current object ("this"?), and then it does math to get the offsets of the member variables.  That would be fine, but if had used Y or Z for the pointer instead, it could have used the "ldd r, Y+4" syntax instead, cutting the instruction count in half.  (Also, it's not doing the obvious peephole optimization to notice that subtracting 5 from X and adding 2 is the same as subtracting 3...  grr.)

I wonder if there's a way to coerce it into better choices?



fat16lib

#72
Jan 07, 2012, 01:29 pm Last Edit: Jan 07, 2012, 03:32 pm by fat16lib Reason: 1
I know about the code bloat when you use all port on the Mega.  For the initial implementation I didn't worry about it since if you use one or two ports it is not a problem.

There is a lot of code that doesn't depend on template parameters and could be moved to another class or plain functions.  That will reduce template code bloat.

I noticed problems with code generation involving head_ and tail_.  They are declared volatile and that seems to make GCC give up on optimization.  I try to fetch them once and use a local variable like h or t.

Edit:
I tried removing volatile from head_ and tail_.  The examples still seem to work and code generation was slightly better.  I don't trust this though since head_ and tail_ are accessed by both the ISRs and non-interrupt code.

My first attempt at removing code bloat didn't work very well.  The case with one port grew in size while the four port case was a bit smaller.  I think I will stick with the current compromise.  I don't think use of four ports is very common and the Mega has more flash.

Funny, I woke last night bothered by the larger Mega examples and then read westfw's post today so I had to try it.

westfw

Quote
I try to fetch [the volatile head and tail pointers] once and use a local variable like h or t.

Yes, I noticed.  That usually works pretty well, as long as there are enough registers to keep the locals in registers (which there are; it just doesn't seem to do a good job of picking which registers in this case.  This is complicated by the fact that the AVR only has two full-featured index registers, despite the claims to "32 general purpose registers."

westfw

Apparently addressed (somewhat) in later releases: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46278

Go Up