An alternative Serial Library for Arduino 1.0

Thanks great. Working fine here.

Just left to sort out my own warning message now :slight_smile:

Iain

Most of the "savings" of using an 8bit index is in the preliminary math BEFORE the copy (after all, the copy itself used to only be one byte.)

00000036 <memx_o>:
uint8_t buffer[BUFSIZE];
uint16_t bufd16;
uint8_t bufd8;

void memx_o(uint8_t *src, uint16_t n)
{
  36:	dc 01       	movw	r26, r24
    uint8_t *limit = src + n;
  38:	68 0f       	add	r22, r24
  3a:	79 1f       	adc	r23, r25
    uint8_t *dst = &buffer[++bufd16 % BUFSIZE];
  3c:	80 91 00 00 	lds	r24, 0x0000
  40:	90 91 00 00 	lds	r25, 0x0000
  44:	01 96       	adiw	r24, 0x01	; 1
  46:	90 93 00 00 	sts	0x0000, r25
  4a:	80 93 00 00 	sts	0x0000, r24
  4e:	fc 01       	movw	r30, r24
  50:	ef 73       	andi	r30, 0x3F	; 63
  52:	f0 70       	andi	r31, 0x00	; 0
  54:	e0 50       	subi	r30, 0x00	; 0
  56:	f0 40       	sbci	r31, 0x00	; 0
  58:	00 c0       	rjmp	.+0      	; 0x5a <memx_o+0x24>
    while (src < limit) {
	*dst++ = *src++;
  5a:	8d 91       	ld	r24, X+
  5c:	81 93       	st	Z+, r24
  5e:	a6 17       	cp	r26, r22
  60:	b7 07       	cpc	r27, r23
  62:	00 f0       	brcs	.+0      	; 0x64 <memx_o+0x2e>
	*dst++ = *src++;
    }
}
  64:	08 95       	ret

vs

00000066 <memx>:

uint8_t bufs;
void memx(uint8_t *src, uint8_t n)
{
  66:	dc 01       	movw	r26, r24
    uint8_t i = ++bufd8 % BUFSIZE;
  68:	80 91 00 00 	lds	r24, 0x0000
  6c:	8f 5f       	subi	r24, 0xFF	; 255
  6e:	80 93 00 00 	sts	0x0000, r24
    uint8_t *dst = &buffer[i];
  72:	8f 73       	andi	r24, 0x3F	; 63
  74:	e8 2f       	mov	r30, r24
  76:	f0 e0       	ldi	r31, 0x00	; 0
  78:	e0 50       	subi	r30, 0x00	; 0
  7a:	f0 40       	sbci	r31, 0x00	; 0
    do {
	*dst++ = *src++;
  7c:	8d 91       	ld	r24, X+
  7e:	81 93       	st	Z+, r24
    } while (--n);
  80:	61 50       	subi	r22, 0x01	; 1
  82:	01 f4       	brne	.+0      	; 0x84 <memx+0x1e>
}
  84:	08 95       	ret

westfw,

I rest my case. There is very little to be saved by limiting buffers to power-of-two sizes and a max size of 256 bytes, which is a capacity of 255 bytes for ring buffers.

I am not sure what your new examples have to do with ring buffers.

How do I use a function that copies n bytes into a location that starts at address buffer[(bufd8 + 1) % BUFFERSIZE] and increments bufd8 by one?

It also has the feature of shredding memory if n == 0. Copy functions should copy no bytes when called with n == 0.

The 16-bit version is strange also. Most of the "setup overhead" is fetching bufd16, incrementing it and storing it.

Try this:

#define BUFSIZE 20
uint8_t buffer[BUFSIZE] = "##";
uint8_t bufd8 = 0;  // not really needed

void memx(uint8_t *src, uint8_t n)
{
  uint8_t i = ++bufd8 % BUFSIZE;
  uint8_t *dst = &buffer[i];
  do {
    *dst++ = *src++;
  } while (--n);
}

void setup() {
  char msg[] = "123456789";  // ten bytes to copy
  Serial.begin(9600);
  
  memx((uint8_t*)msg, 10);
  Serial.println((char*)buffer);
  
  memx((uint8_t*)msg, 10);  
  Serial.println((char*)buffer);
  
  // Wait a second before dying
  delay(1000);
  
  // shred memory
  memx((uint8_t*)msg, 0);

  Serial.println("You May not see this");
}
void loop() {}

Edit: Is BUFSIZE limited to powers-of-two?

I decided to offer 8-bit buffer indices as an option, the customer is always right.

I defined a typedef for buffer indices so only one line needs to be change to select 8-bit or 16-bit indices.

typedef uint8_t buf_size_t;

or

typedef uint16_t buf_size_t;

The result is the BufferedTest example takes 1838 bytes of flash with 16-bit indices and 1642 bytes with 8-bit indices.

The WriteFlash test takes about 70 us to copy its string with 16-bit indices and 64 us with 8-bit indices.

So 8-bit indices result in about 200 byte less flash and are 10% faster.

I will make buf_size_t the fourth argument of the template with default value uint8_t. If RX or TX buffer size is too large I will cause a compile time error with a buffer size too large message. The user can then specify uint16_t as the fourth argument of the tremplate.

I suppose that the most controversial issue at this point is whether the users should be exposed to template syntax.

How about:

#define USE_SERIAL SerialPort<0, 64, 64> NewSerial
#define USE_SERIAL1 SerialPort<1, 64, 64> NewSerial1
#define USE_SERIAL2 SerialPort<2, 64, 64> NewSerial2
#define USE_SERIAL3 SerialPort<3, 64, 64> NewSerial3

A hypothetical wholescale replacement of HardwareSerial.h could have:

#if !defined(__SerialPortAlreadyLoaded__)
#include <SerialPort.h>
USE_SERIAL;
#if defined(UBRR1H)
USE_SERIAL1;
#endif
#if defined(UBRR2H)
USE_SERIAL2;
#endif
#if defined(UBRR3H)
USE_SERIAL3;
#endif
#endif

and wind up approximately where the defaults are today.
if the user had already set up a different SerialPort structure, that would be used instead.

Is there something that guarantees that the static arrays used for the usartRegister class don't end up in your actual image? I noticed a while ago that uart registers had a fixed layout and common constants, and set them up as an overlaid structure, with similar savings in execution time. (and special cases need for ATmega8) This looks more general; I'm still looking for a downside...

Do you have a tutorial on templates that you like? When do they usually show up, in the course of learning C++?

westfw,

I like your idea of a simple way for users to avoid templates and will include it.

The only way to avoid having the static arrays appear in the code is to use constant indices. You can use the GCC compile time check.

__builtin_constant_p(index)

I didn't do this. I just went back to using a struct and a const array with the addresses.

struct UsartRegister {
  volatile uint8_t* ucsra;  /**< USART Control and Status Register A */
  volatile uint8_t* ucsrb;  /**< USART Control and Status Register B */
  volatile uint8_t* ucsrc;  /**< USART Control and Status Register C */
  volatile uint8_t* ubrrl;  /**< USART Baud Rate Register Low */
  volatile uint8_t* ubrrh;  /**< USART Baud Rate Register High */
  volatile uint8_t* udr;    /**< USART I/O Data Register */
};
static const UsartRegister usart[] = {// initialization follows

If the array appears in the image, they are in RAM so it is not too subtle.

There are lots of pitfalls with templates but they are the heart of the C++ standard libraries. Standard C++ IOStreams are implemented entirely in templates.

Edit: The three books below are hard going. You should master this first:
Programming: Principles and Practice Using C++ by Bjarne Stroustrup Paperback $47.59.

For use of templates and C++ design in general I recommend three books but you need to wait for a new edition of one of these.

First for general C++ design:

Modern C++ Design: Generic Programming and Design Patterns Applied by Andrei Alexandrescu Paperback $41.36. This book was published in 2001 so I expect a newer version. Templates appear in chapter 2.

For advice on how to use C++ features:

Effective C++: 55 Specific Ways to Improve Your Programs and Designs (3rd Edition) by Scott Meyers Paperback $37.30. This edition was published in 2005.

Finally, the book that that will be published soon:
C++ Templates [Hardcover] David Vandevoorde, Nicolai Josuttis http://www.amazon.com/C-Templates-David-Vandevoorde/dp/0321714121/ref=dp_ob_title_bk.

There is an old edition:
C++ Templates: The Complete Guide by David Vandevoorde, Nicolai Josuttis Hardcover $52.73. It was published in 2002.

I have posted a version with the option for 8-bit or 16-bit buffer indices as SerialPortBeta20120106.zip here Google Code Archive - Long-term storage for Google Code Project Hosting..

The default is to use 16-bit indices. To use 8-bit indices, edit SerialPort.h and change

#define ALLOW_LARGE_BUFFERS 1

to

#define ALLOW_LARGE_BUFFERS 0

I also enable faster overrides for write(const char*) and write(const uint8_t*, size_t). With these options SerialPort is still smaller than HardwareSerial for most applications.

My test example had the following Sketch Sizes.

Arduino 1.0: 6286 bytes

SerialPort as released: 6062 bytes

SerialPort with 8-bit indices and disabled faster overrides: 5776 bytes

I have also included a suggestion by westfw to hide templates.

Here is the new HelloWorld example with westfw's suggestion:

// Simple usage with buffering like Arduino 1.0
#include <SerialPort.h>

// use NewSerial for port 0
USE_NEW_SERIAL;

void setup() {
  NewSerial.begin(9600);
  NewSerial.println("Hello World!");
}
void loop() {}

It looks pretty good; it's been a while since I've looked at generated assembly and haven't groaned :slight_smile:

It looks as though there is some code boating when multiple serial ports are used on a MEGA; presumably the templates generate duplicate code with constants rather than more generic (and slower?) functions. That's probably OK given that devices with extra serial ports have extra codespace as well, and it's not TOO awful. A trivial example with all four ports is 4148 bytes in a (slightly fixed) HardwareSerial, and 4598 bytes using the NewSerial templates.

westfw:
It looks pretty good; it's been a while since I've looked at generated assembly and haven't groaned :slight_smile:

It looks as though there is some code boating when multiple serial ports are used on a MEGA; presumably the templates generate duplicate code with constants rather than more generic (and slower?) functions. That's probably OK given that devices with extra serial ports have extra codespace as well, and it's not TOO awful. A trivial example with all four ports is 4148 bytes in a (slightly fixed) HardwareSerial, and 4598 bytes using the NewSerial templates.

But the bonus of using the new library is you don't automatically lose RAM to all four serial ports on the Mega. If you're only using two ports you get a lot of RAM back. AND it's a LOT faster than HardwareSerial.

Iain

This is a bit weird, though it seems a gcc thing:

  buf_size_t t = tail_;
 2da:	14 96       	adiw	X, 0x04	; 4
 2dc:	2d 91       	ld	r18, X+
 2de:	3c 91       	ld	r19, X
 2e0:	15 97       	sbiw	X, 0x05	; 5
   if (head_ == t) return false;
 2e2:	12 96       	adiw	X, 0x02	; 2
 2e4:	8d 91       	ld	r24, X+
 2e6:	9c 91       	ld	r25, X
 2e8:	13 97       	sbiw	X, 0x03	;

this sort of construct is used all over SerialRingBuffer; It looks like X is used as an index to the current object ("this"?), and then it does math to get the offsets of the member variables. That would be fine, but if had used Y or Z for the pointer instead, it could have used the "ldd r, Y+4" syntax instead, cutting the instruction count in half. (Also, it's not doing the obvious peephole optimization to notice that subtracting 5 from X and adding 2 is the same as subtracting 3... grr.)

I wonder if there's a way to coerce it into better choices?

I know about the code bloat when you use all port on the Mega. For the initial implementation I didn't worry about it since if you use one or two ports it is not a problem.

There is a lot of code that doesn't depend on template parameters and could be moved to another class or plain functions. That will reduce template code bloat.

I noticed problems with code generation involving head_ and tail_. They are declared volatile and that seems to make GCC give up on optimization. I try to fetch them once and use a local variable like h or t.

Edit:
I tried removing volatile from head_ and tail_. The examples still seem to work and code generation was slightly better. I don't trust this though since head_ and tail_ are accessed by both the ISRs and non-interrupt code.

My first attempt at removing code bloat didn't work very well. The case with one port grew in size while the four port case was a bit smaller. I think I will stick with the current compromise. I don't think use of four ports is very common and the Mega has more flash.

Funny, I woke last night bothered by the larger Mega examples and then read westfw's post today so I had to try it.

I try to fetch [the volatile head and tail pointers] once and use a local variable like h or t.

Yes, I noticed. That usually works pretty well, as long as there are enough registers to keep the locals in registers (which there are; it just doesn't seem to do a good job of picking which registers in this case. This is complicated by the fact that the AVR only has two full-featured index registers, despite the claims to "32 general purpose registers."

Apparently addressed (somewhat) in later releases: 46278 – avr-gcc 4.5.1 doing suboptimal reloads using X

I have posted a fast serial data logger that can log sustained serial data to an SD card at 57,600 baud on a 328 Arduino and 115,200 baud on a Mega.

For details see this post http://arduino.cc/forum/index.php/topic,86300.0.html

Hi All...

I am trying to use this library in a large application, and when I link it I get the errors pasted below. I have removed all places where the HardwareSerial.h library exists, and removed all Serial1 and Serial calls. I can build the samples. I realize the cause is because the compiler must be finding the original Serial library somewhere, but how to find it? Any hints?

Thanks...

-LC:\Users\jim\AppData\Local\Temp\build2981441506981796786.tmp -lm
core.a(HardwareSerial.cpp.o): In function __vector_20': C:\Users\jim\Desktop\arduino-1.0\hardware\maniacbug-mighty-1284p-8f335f8\cores\standard/HardwareSerial.cpp:106: multiple definition of __vector_20'
SerialPort\SerialPort.cpp.o:C:\Users\jim\Desktop\arduino-1.0\libraries\SerialPort/SerialPort.cpp:198: first defined here
core.a(HardwareSerial.cpp.o): In function __vector_28': C:\Users\jim\Desktop\arduino-1.0\hardware\maniacbug-mighty-1284p-8f335f8\cores\standard/HardwareSerial.cpp:127: multiple definition of __vector_28'
SerialPort\SerialPort.cpp.o:C:\Users\jim\Desktop\arduino-1.0\libraries\SerialPort/SerialPort.cpp:207: first defined here
core.a(HardwareSerial.cpp.o): In function __vector_21': C:\Users\jim\Desktop\arduino-1.0\hardware\maniacbug-mighty-1284p-8f335f8\cores\standard/HardwareSerial.cpp:190: multiple definition of __vector_21'
SerialPort\SerialPort.cpp.o:C:\Users\jim\Desktop\arduino-1.0\libraries\SerialPort/SerialPort.cpp:243: first defined here
core.a(HardwareSerial.cpp.o): In function __vector_29': C:\Users\jim\Desktop\arduino-1.0\hardware\maniacbug-mighty-1284p-8f335f8\cores\standard/HardwareSerial.cpp:221: multiple definition of __vector_29'
SerialPort\SerialPort.cpp.o:C:\Users\jim\Desktop\arduino-1.0\libraries\SerialPort/SerialPort.cpp:252: first defined here

Interesting, I see it's not working out of the box with maniacbug-mighty-1284p :grin:

__vector_20 is USART0_RX_vect. The 1284P defines the USART0 vectors, HardwareSerial picks them up. Shouldn't this also have the same problem on a Mega?

Looks like this library is demanding access to the USART0_RX_vect?

maniacbug:
Interesting, I see it's not working out of the box with maniacbug-mighty-1284p :grin:

__vector_20 is USART0_RX_vect. The 1284P defines the USART0 vectors, HardwareSerial picks them up. Shouldn't this also have the same problem on a Mega?

Looks like this library is demanding access to the USART0_RX_vect?

Oh? I was able to build a few of the examples that came with this library using your cores. I had suspected it was a core issue, but after I was able to build the examples I assumed it must be my code.

If I remove the HardwareSerial.cpp file, then the compilation fails. Its getting dragged in somewhere.

Hmm, I mis-spoke. As you say, examples compile fine on mighty-1284p.

What's happening is that the LINKER wants to include these symbols from HardwareSerial. There's no avoiding the compiler picking them up.

It would be interesting to see the errors you get when removing HardwareSerial.cpp.

A number of libraries access Serial. This causes HardwareSerial to be loaded even if you don't call the functions that access Serial. These libraries can't be used with SerialPort.

All released SD.h and SdFat libraries cause HardwareSerial to be loaded and can't be used with SerialPort. I will soon release a version of SdFat that does not access HardwareSerial.

A beta version of this SdFat is in SerialLoggerBeta20120108.zip here Google Code Archive - Long-term storage for Google Code Project Hosting..

fat16lib:
A beta version of this SdFat is in SerialLoggerBeta20120108.zip here Google Code Archive - Long-term storage for Google Code Project Hosting..

Okay, when I use that beta I can link if I remove another library from my system. I was taking a look to see how I might be able to modify that library, and I looked into your headers and saw that in sdFatUtil.h:

#include <avr/pgmspace.h>
#if ARDUINO < 100
#include <WProgram.h>
#else  // ARDUINO
#include <Arduino.h>
#endif  // ARDUINO

WProgram.h brings in:

#include "HardwareSerial.h"

But my app includes that header.

How did you resolve this conflict?