Arduino 1.0 Serial - major RAM hog

Here are some results for RAM use by Serial in Arduino 1.0 beta 3.

First, output is now buffered but the buffer size has been reduced to 64 bytes. If you want the same size Serial receive buffer, 128 bytes, as in 0022, you will also get a 128 byte send buffer.

Second, Serial is always loaded, even if it is not used by a sketch. This happens because of this line in main.cpp:

    serialEventRun();

I used the MinSerial library from here http://arduino.cc/forum/index.php/topic,72087.0.html to do some tests. I commented out the above line in main.cpp to prevent loading of Serial when I used MinSerial.

Even if you don't use Serial, 172 bytes of RAM will be allocated on a 328 and 676 bytes of RAM will be allocated on a Mega.

If you increase the buffer size to 128 to match the 0022 receive buffer size, the result is as follows. 300 bytes of RAM on a 328 and 1188 bytes on a Mega.

This is probably obvious to you fat16lib but, it looks to me the the allocated ram is consistent with the amount of serial ports. So for the Mega it is no quite 172 x 4 = 684.

It would be nice if ram allocation was not automatic but, it is easier for a novice to have it allocated by the program.

Even if you don't use Serial, 172 bytes of RAM will be allocated on a 328 and 676 bytes of RAM will be allocated on a Mega.

fat16lib:
Here are some results for RAM use by Serial in Arduino 1.0 beta 3.

First, output is now buffered but the buffer size has been reduced to 64 bytes. If you want the same size Serial receive buffer, 128 bytes, as in 0022, you will also get a 128 byte send buffer.

Second, Serial is always loaded, even if it is not used by a sketch. This happens because of this line in main.cpp:

    serialEventRun();

That's not good, you should pay for things you're not using, especially on a microcontroller with (very) limited resources.

fat16lib:
Even if you don't use Serial, 172 bytes of RAM will be allocated on a 328 and 676 bytes of RAM will be allocated on a Mega.

That definitely needs fixing IMHO...

serialEventRun();

There was a fix for this discussed on the Developer list. It looks like it was implemented in beta4...
(weak symbols used for serialEventRun(), so that if the serial library is not otherwise used, serialEventRun() stays
undefined (0) and the new code reads:

 if (serialEventRun) serialEventRun();

the sort of statement you probably never want to see in your own programs!)

Beta 4 fixes problem that Serial is loaded even if you don't use it.

It still has the problem that buffers for all Serial ports are allocated and you can't independently set their size.

This means that if you need a large receive buffer on one port of a Mega, you must accept seven other large buffers.

For example, to log serial data to an SD card reliably requires about 200 ms of Serial receive buffering so data won't be lost during the maximum SD write latency time.

At 115200 this requires about 2500 bytes of buffer. This is impossible on the Mega since eight equal size buffers are required for a total of 20,000 bytes.

But that's not new. Isn't there a non-blocking API for SD cards? I was (vaguely) under the impression that they had their own RAM buffers that you filled up, fired off a "program" command, and could go off and do other stuff...
2500 bytes of buffering is well off the bell curve anyway; you might as well write your own serial code and worry about how to prevent it from interfering with the core serial code...

There is no non-blocking API for SD cards. The SD spec allows a SD card to go busy for up to 250 ms after you send the data. You can't determine if the write was successful until after the busy period. You must wait if busy happens when you are writing a file structure because you have not written data from the callers buffer.

Even if you could do something else you need the buffering for incoming serial data.

I know that all serial buffers are the same size for a Mega in 0022. 0022 was bad but 1.0 is worse, there are twice as many buffers.

The SD is only one example. Large serial buffers are useful for other apps. What is this bell curve that defines proper use of serial? When is a serial buffer too large?

There is no technical excuse for not having run-time buffer sizes for serial ports. Commercial embedded kernels have supported run-time serial buffer sizes for thirty years.

Forty years ago I wrote a serial handler for a PDP-8e with run-time buffer sizes. This machine had 8K words where a word was 12-bits.

@fat16lib

Keep in mind that these betas are designed to figure out this kind of issues. The forum is not the right place to discuss technical implementations. Join the Arduino Developers mailing list and let's discuss it there where we can coordinate what needs to be done to fix it.

One general note is that we have a target for Maker Faire (17/18 September) where the API will be frozen and we send out a Release Candidate of Arduino 1.0, then we'll engage the community in a testing effort to verify if all the features are working correctly.
Some work will need to be done to adapt the libraries to 1.0
About a month later we should have the official release

m

fat16lib:
There is no technical excuse for not having run-time buffer sizes for serial ports. Commercial embedded kernels have supported run-time serial buffer sizes for thirty years.

When buffers are this large you might as well pay the price for including a copy of malloc() and do it properly.

I agree, you should be able to use malloc().

I have been working on a personal replacement for HardwareSerial. If you only call begin(), it uses no buffers and no interrupts. Even sketches with some input work since they tend to read in a tight loop and the serial hardware has a two level input buffer register and an input shift register.

I have two calls to connect interrupts with buffers, one call that uses malloc and sets a status bit to remember to call free and one call to use static arrays in the sketch.

  bool connectInterrupt(size_t rxSize, size_t txSize); // uses malloc
  bool connectInterrupt(uint8_t* rxBuf, size_t rxSize, uint8_t* txBuf, size_t txSize);

If a buffer is zero length, the corresponding RX or TX will not use interrupts.

malloc is only about 500 bytes of code on AVR, so I guess it wouldn't be awful. Of course, the MEGA is already a relatively uncommon arduino. On the 28pin AVR-based Arduino, the issue is cloudier. 500 bytes of code to be able to modify a 128byte RAM buffer is not a clear win, especially if you can't allocate that 250ms of buffer anyway because that much ram doesn't exist.

What I'd like to see is easier and better documented ways to override core functions in general...

westfw:
malloc is only about 500 bytes of code on AVR, so I guess it wouldn't be awful.

Yep. I just did a quick test on my Arduino Uno and sketch size went up by 534 bytes when I used malloc()/free().

To me that's far preferable to using up RAM for no good reason (especially since I know I can use malloc() too...the price has already been paid!)

realloc() used about as much again - not so good...but it's easy to avoid using that.

I agree that it should be easier to override core functions like Serial.

No you can't allocate 2,500 bytes on a 328. But you can make the input buffer twice as large in 0022 as you will be able to in 1.0.

Here is a user who is logging serial data to an SD at 3088 bytes per second on a 328 by setting the receive buffer size to 600. This won't be possible with 1.0.

http://arduino.cc/forum/index.php/topic,69263.0.html

Sometimes you want no output buffering and very small input buffers, just enough for debug of a sketch that won't use Serial but uses a lot of RAM. Other time you want large buffers. You don't want to edit core files for each app.

In 1.0 you lose more control.

I am close to finishing a reimplementation of HardwareSerial with run-time buffer sizes. It will be interesting to see how much larger it will be.

In 1.0 you lose more control.

I dunno. in 1.0 you have exactly the same lack of control, but it's "different."

It will be interesting to see how much larger it will be.

Do the SD libraries already use malloc()? If so, the hypothetical datalogger should barely notice.
And it's not like anyone has been trying to keep the size of the serial code down, anyway. It's gotten bigger with pretty much each rev for quite a while now, and as you said elsewhere, the core team resists alternative implementations. (also wondering if the "print" class has gotten bigger than "printf" yet...)

Independent of SD libraries, I am interested in a replacement for HardwareSerial with the same API.

I have done enough code to know that I can produce a smaller library with all the functionality of HardwareSerial in 1.0 beta 4 and independently specify the size of every buffer at compile time. This is a transparent replacement.

I think I can make a version that is smaller than 1.0 beta 4 with run time sizes if the buffers are user defined static arrays.

I am working on a version that uses malloc/free and it look like it will be 200-500 flash bytes larger than 1.0 beta 4 if you are not using malloc/free for other purposes in your sketch.

I have made this library start in unbuffered mode. It starts with 0022 style output. It has just the three characters of hardware input buffering, the two level receive data register and the receive shift register. Overflow happens when the start bit for the fourth character is detected.

In this mode it is very small and works with most simple sketches that do limited input.

You can call a buffer allocation function to allocate rx and tx buffers and use interrupts. Calling this function causes malloc/free to be loaded. It looks like the total flash size will then be 200-500 bytes larger than 1.0 beta 4.

My SdFat library does not use malloc/free.

I really don't like using the heap in embedded systems except at system start-up.

I worked with critical systems where programming standards forbid using the heap after system start-up.

The Joint Strike Fighter standard is typical http://www2.research.att.com/~bs/JSF-AV-rules.pdf.

Fragmentation of the heap can cause a stack overflow crash with lots of free memory in small chunks in the heap.

I really don't like using the heap in embedded systems except at system start-up.

Well, we certainly agree there. I wonder if you can use "weak symbols" for serial buffers. I'm gonna have to read up on those...
(a weak symbol is one that the linker will leave unresolved unless it's explicitly included, or something like that.)

fat16lib:
I really don't like using the heap in embedded systems except at system start-up.

That's common sense, really.

For the Arduino serial ports it would have to be done on "first use" though. Doing it at system startup is the same as allocating them statically.

Maybe we need a "malloc()" with no "free()" or "realloc()" to spoil things...

fat16lib:
The Joint Strike Fighter standard is typical http://www2.research.att.com/~bs/JSF-AV-rules.pdf.

Some light bedtime reading...

No, I glanced through it and the bits I was were mostly sensible.

This is a much larger issue than just ram.
The new HardwareSerial has not only increased ram usage, but increased code usage as well
as changed xmit behavior/timing by forcing all transmits to use an interrupt driven buffer.
IMHO, that is too much to be changing this late in the game.
i.e this has the potential to break existing working code that has worked for years
with lots of subtle hard to find errors from RAM overflows and will cause some projects to no longer fit
inside the AVR, especially on older mega8 and mega168 parts.

Why not provide a backward compatible mechanism that can not only provide
the existing xmit behavior (non buffered xmits) but also provide away to eliminate TX or RX buffers
when not needed as well as provide the new xmit buffering functionality?

One way would be to overload the begin() with a new begin() that allows the user to determine
which directions are enabled and allow the existing begin() to enable the existing
behavior (no xmit buffering). This provides the new functionality without breaking the old behavior
or increasing RAM usage and provides a method to not only eliminate the xmit ram
when not needed but also adds the capability to eliminate the RX ram when it is not needed.

For example, there are some serial devices that only need to receive data. In that case
the RX buffer in the arduino is simply not needed and is wasting AVR resources.

An alternate method to detect the buffer needs would be to change nothing in the API
but then allocate the needed buffer(s) on the first read()/write() using malloc().
Yes there are some other small mods that need to be done in the code to handle things
when there are no buffers allocated. But this would allow a buffer to be allocated only
when it was really needed. So if there were no transmits, no xmit would allocated and
same for RX. It might also allow a larger RX buffer if there were no Xmit buffer needed.
This does not solve the transmit without TX buffer issue (non buffered xmits) and
there might be some lost RX data initially as the RX receiver would not be enabled until
there was an actual read() request.

As far as the code space use by malloc(), nearly half of that can be recovered
by merely changing the head and tail indexes in the ring_buffer structure to be uint8_t rather than ints.
Not only does this save nearly 200 bytes of code, but the code
"as is" really doesn't work anyway with ints anyway
since there are cases where compares and polling is done on them and interrupts are not blocked.
It only works today, because the buffers are smaller than 256 bytes and the upper 8 bits of the int are never used.
In order to really support larger than 256 byte buffers, there needs to be some atomic access to the head/tail
indexes in a few places.
If there is a desire to support larger than 256 bytes buffers, you can still do that with some #if statements
that change the head/tail pointers back to ints and turn on the needed atomic blocking where needed
if the buffer size is larger than 256.
Otherwise, the head/tail indexes might as well be reduced 8 bit values to save the code space - and a few bytes of RAM.

There is also the blocking issue in write()
When the buffer does fill, the write() blocks.
It seems to me that if users don't want' blocking, they may not ever want blocking
and there is no way to avoid this potential blocking.
To me, it seems that if the write is non blocking and returns how many characters that it wrote
that it should not block and return with a 0 if it can't transmit the character.
But that has the potential to create even more subtle bugs as existing code does
not check the return code and users are not accustomed to having to do this or risk
lost xmit data.

These are some of the issues as to why I say that changing the hardwareSerial behavior
to use this new xmit buffering by default is too much too late,
especially given that the full new behavior is forced on everyone with no
way to tweak things to the sketches particular needs or even return to the old behavior and RAM usage
without going in and modifying the HardwareSerial source code.

Worst case a simple define that users could turn on in their HardwareSerial code/header file if they
really need to get back to the non buffered behavior. (to restore timing, reduce RAM and reduce code size)

--- bill

bperrybap:
There is also the blocking issue in write()
When the buffer does fill, the write() blocks.

So a program can lock up just because I don't open the serial monitor to see the output?

Blocking should really be optional

Buffering of writes is mostly pointless anyway - if a program is sending data faster than the serial port can transmit then very few programs will benefit from buffering, only those which do some massive calculation then send small blocks of data which exactly fit in the output buffer. Do such programs even exist on Arduino?

It's not worth wrecking most programs just for the benefit of the few. A function to check if the serial output is free/busy would let those programs run at full speed (by doing their own polled buffering) without hurting anybody else.

This doesn't sound like it's well thought out, especially if lots of programs are going to break because of all the extra RAM usage, code space usage and interrupts.

Hardware serial already blocks on the UDRE bit, if the new version blocks on the buffer being full I don't see any practical difference, except it will probably happen less often (depending on the frequency that bytes are sent).

I agree however that forcing fixed buffer sizes for both directions is dumb. As you say just overload .begin() with a version that takes two extra parms for the buffer sizes. No parms you get the old routine, with parms you get buffers and interrupt-driven funcs.

That doesn't break any existing code and those that want the new feature can have it.


Rob