Arduino 1.0 Serial - major RAM hog

I agree that it should be easier to override core functions like Serial.

No you can't allocate 2,500 bytes on a 328. But you can make the input buffer twice as large in 0022 as you will be able to in 1.0.

Here is a user who is logging serial data to an SD at 3088 bytes per second on a 328 by setting the receive buffer size to 600. This won't be possible with 1.0.

http://arduino.cc/forum/index.php/topic,69263.0.html

Sometimes you want no output buffering and very small input buffers, just enough for debug of a sketch that won't use Serial but uses a lot of RAM. Other time you want large buffers. You don't want to edit core files for each app.

In 1.0 you lose more control.

I am close to finishing a reimplementation of HardwareSerial with run-time buffer sizes. It will be interesting to see how much larger it will be.

In 1.0 you lose more control.

I dunno. in 1.0 you have exactly the same lack of control, but it's "different."

It will be interesting to see how much larger it will be.

Do the SD libraries already use malloc()? If so, the hypothetical datalogger should barely notice.
And it's not like anyone has been trying to keep the size of the serial code down, anyway. It's gotten bigger with pretty much each rev for quite a while now, and as you said elsewhere, the core team resists alternative implementations. (also wondering if the "print" class has gotten bigger than "printf" yet...)

Independent of SD libraries, I am interested in a replacement for HardwareSerial with the same API.

I have done enough code to know that I can produce a smaller library with all the functionality of HardwareSerial in 1.0 beta 4 and independently specify the size of every buffer at compile time. This is a transparent replacement.

I think I can make a version that is smaller than 1.0 beta 4 with run time sizes if the buffers are user defined static arrays.

I am working on a version that uses malloc/free and it look like it will be 200-500 flash bytes larger than 1.0 beta 4 if you are not using malloc/free for other purposes in your sketch.

I have made this library start in unbuffered mode. It starts with 0022 style output. It has just the three characters of hardware input buffering, the two level receive data register and the receive shift register. Overflow happens when the start bit for the fourth character is detected.

In this mode it is very small and works with most simple sketches that do limited input.

You can call a buffer allocation function to allocate rx and tx buffers and use interrupts. Calling this function causes malloc/free to be loaded. It looks like the total flash size will then be 200-500 bytes larger than 1.0 beta 4.

My SdFat library does not use malloc/free.

I really don't like using the heap in embedded systems except at system start-up.

I worked with critical systems where programming standards forbid using the heap after system start-up.

The Joint Strike Fighter standard is typical http://www2.research.att.com/~bs/JSF-AV-rules.pdf.

Fragmentation of the heap can cause a stack overflow crash with lots of free memory in small chunks in the heap.

I really don't like using the heap in embedded systems except at system start-up.

Well, we certainly agree there. I wonder if you can use "weak symbols" for serial buffers. I'm gonna have to read up on those...
(a weak symbol is one that the linker will leave unresolved unless it's explicitly included, or something like that.)

fat16lib:
I really don't like using the heap in embedded systems except at system start-up.

That's common sense, really.

For the Arduino serial ports it would have to be done on "first use" though. Doing it at system startup is the same as allocating them statically.

Maybe we need a "malloc()" with no "free()" or "realloc()" to spoil things...

fat16lib:
The Joint Strike Fighter standard is typical http://www2.research.att.com/~bs/JSF-AV-rules.pdf.

Some light bedtime reading...

No, I glanced through it and the bits I was were mostly sensible.

This is a much larger issue than just ram.
The new HardwareSerial has not only increased ram usage, but increased code usage as well
as changed xmit behavior/timing by forcing all transmits to use an interrupt driven buffer.
IMHO, that is too much to be changing this late in the game.
i.e this has the potential to break existing working code that has worked for years
with lots of subtle hard to find errors from RAM overflows and will cause some projects to no longer fit
inside the AVR, especially on older mega8 and mega168 parts.

Why not provide a backward compatible mechanism that can not only provide
the existing xmit behavior (non buffered xmits) but also provide away to eliminate TX or RX buffers
when not needed as well as provide the new xmit buffering functionality?

One way would be to overload the begin() with a new begin() that allows the user to determine
which directions are enabled and allow the existing begin() to enable the existing
behavior (no xmit buffering). This provides the new functionality without breaking the old behavior
or increasing RAM usage and provides a method to not only eliminate the xmit ram
when not needed but also adds the capability to eliminate the RX ram when it is not needed.

For example, there are some serial devices that only need to receive data. In that case
the RX buffer in the arduino is simply not needed and is wasting AVR resources.

An alternate method to detect the buffer needs would be to change nothing in the API
but then allocate the needed buffer(s) on the first read()/write() using malloc().
Yes there are some other small mods that need to be done in the code to handle things
when there are no buffers allocated. But this would allow a buffer to be allocated only
when it was really needed. So if there were no transmits, no xmit would allocated and
same for RX. It might also allow a larger RX buffer if there were no Xmit buffer needed.
This does not solve the transmit without TX buffer issue (non buffered xmits) and
there might be some lost RX data initially as the RX receiver would not be enabled until
there was an actual read() request.

As far as the code space use by malloc(), nearly half of that can be recovered
by merely changing the head and tail indexes in the ring_buffer structure to be uint8_t rather than ints.
Not only does this save nearly 200 bytes of code, but the code
"as is" really doesn't work anyway with ints anyway
since there are cases where compares and polling is done on them and interrupts are not blocked.
It only works today, because the buffers are smaller than 256 bytes and the upper 8 bits of the int are never used.
In order to really support larger than 256 byte buffers, there needs to be some atomic access to the head/tail
indexes in a few places.
If there is a desire to support larger than 256 bytes buffers, you can still do that with some #if statements
that change the head/tail pointers back to ints and turn on the needed atomic blocking where needed
if the buffer size is larger than 256.
Otherwise, the head/tail indexes might as well be reduced 8 bit values to save the code space - and a few bytes of RAM.

There is also the blocking issue in write()
When the buffer does fill, the write() blocks.
It seems to me that if users don't want' blocking, they may not ever want blocking
and there is no way to avoid this potential blocking.
To me, it seems that if the write is non blocking and returns how many characters that it wrote
that it should not block and return with a 0 if it can't transmit the character.
But that has the potential to create even more subtle bugs as existing code does
not check the return code and users are not accustomed to having to do this or risk
lost xmit data.

These are some of the issues as to why I say that changing the hardwareSerial behavior
to use this new xmit buffering by default is too much too late,
especially given that the full new behavior is forced on everyone with no
way to tweak things to the sketches particular needs or even return to the old behavior and RAM usage
without going in and modifying the HardwareSerial source code.

Worst case a simple define that users could turn on in their HardwareSerial code/header file if they
really need to get back to the non buffered behavior. (to restore timing, reduce RAM and reduce code size)

--- bill

bperrybap:
There is also the blocking issue in write()
When the buffer does fill, the write() blocks.

So a program can lock up just because I don't open the serial monitor to see the output?

Blocking should really be optional

Buffering of writes is mostly pointless anyway - if a program is sending data faster than the serial port can transmit then very few programs will benefit from buffering, only those which do some massive calculation then send small blocks of data which exactly fit in the output buffer. Do such programs even exist on Arduino?

It's not worth wrecking most programs just for the benefit of the few. A function to check if the serial output is free/busy would let those programs run at full speed (by doing their own polled buffering) without hurting anybody else.

This doesn't sound like it's well thought out, especially if lots of programs are going to break because of all the extra RAM usage, code space usage and interrupts.

Hardware serial already blocks on the UDRE bit, if the new version blocks on the buffer being full I don't see any practical difference, except it will probably happen less often (depending on the frequency that bytes are sent).

I agree however that forcing fixed buffer sizes for both directions is dumb. As you say just overload .begin() with a version that takes two extra parms for the buffer sizes. No parms you get the old routine, with parms you get buffers and interrupt-driven funcs.

That doesn't break any existing code and those that want the new feature can have it.


Rob

So a program can lock up just because I don't open the serial monitor to see the output?

Blocking should really be optional

No, there is no handshaking between the PC/FTDI/8u2 and the AVR, it just sends data out regardless if there is anything out there ready to read it. Blocking only lasts as long as it takes the serial characters to leave the hardware USART on the chip, so it's baud rate dependent as well as how many characters need to be sent.

Lefty

retrolefty:
No, there is no handshaking between the PC/FTDI/8u2 and the AVR, it just sends data out regardless if there is anything out there ready to read it.

Yes...obvious really - there's no handshaking wires in the serial connector. :blush:

retrolefty:

So a program can lock up just because I don't open the serial monitor to see the output?

Blocking should really be optional

No, there is no handshaking between the PC/FTDI/8u2 and the AVR, it just sends data out regardless if there is anything out there ready to read it. Blocking only lasts as long as it takes the serial characters to leave the hardware USART on the chip, so it's baud rate dependent as well as how many characters need to be sent.

Lefty

As far as handshaking goes that is assuming a simple UART implementation.
Some "Arduino" boards provide serial communications over native USB
(teensy for example) which does have flow control handshaking.
And while those implementations are not using the Arduino core code,
in order to have a consistent API, the serial API should clearly define its behaviors.
Currently, it doesn't. The serial.write() function currently (pre 1.0) does not define if it blocks or not.

For those programs that do not want blocking to ever occur, blocking for
"as long as it takes the serial characters to leave the hardware USART on the chip" may be too long.
Which can be quite a long time since things like print() can block on each output character once
the output buffer fills.

I can easily envision applications that would need the API to never block even for a
single character (say you want to output characters from an ISR), or simply can't tolerate any
blocking because of other critical timing.
And then there are other cases where you want no buffering due to RAM needs or other synchronization needs
and others where you want buffering but it is ok to block like it does today in 1.0

Applications can have needs and uses for buffered output as well as non buffered output
but the newest HardwareSerial code is forcing a change from a non buffered output implementation that
was smaller and used no RAM that had been in place for many years
to one that is larger and uses quite a bit of RAM with no ability to alter the buffering
or behaviors.

The current 1.0 implementation is now a hybrid. It sometimes will block and sometimes not block
depending on character output patterns and baud rates,
but yet you always pay for it in terms of code size and RAM usage because there is no way
to configure it without editing the actual HardwareSerial source code.

serial 1.0 defines a return value of how many characters were written.
Once the API defines a return value of the number of characters written, users must now be prepared
to deal with that being zero - while today that is not the case, the new 1.0 serial API does allow that.
So for example, if the serial write() function were to return zero when the buffer filled vs block,
things like the Print class break (or probably not work as expected) because while
they do use the return value from write() they do not use it advance the
character pointer. i.e. any characters passed to write() when write() returns zero are dropped on the floor.
There are pros/cons to having a non blocking Print class but this is currently not configurable.

Seems like this implementation needs more tweaking before a final release because like I said
before, these API and under the hood changes are changing several things all at once
and offer no way to tune the behaviors or additional resources being used nor
do they offer a way back to the previous behavior and minimal resources used.

I agree with fungus, in that I question the value of forcing buffered output on everyone.
I think that the new buffering functionality should have to be be requested rather than forced so that by default
those less sophisticated users (which is what arduino is all about) can continue to have the
simpler, non overlapping, more code and ram efficient interface they've had
for years and those that want/need the additional capabilities can turn them on
tune them to their needs. (rx&tx buffer sizes, and blocking/non-blocking on full tx buffers)

--- bill

The serial.write() function currently (pre 1.0) does not define if it blocks or not.

Arduino is (currently) a single-threaded microcontroller with no operating system. "blocking" is not even defined in those circumstances!

westfw:

The serial.write() function currently (pre 1.0) does not define if it blocks or not.

Arduino is (currently) a single-threaded microcontroller with no operating system. "blocking" is not even defined in those circumstances!

Huh? I think you have misunderstood the discussion.
We were in no way referring to a any sort of OS scheduling. What we are talking about is
whether the serial output function spin waits ("blocks" all foreground execution flow) until there is room in the buffer
or whether it returns immediately with a status indicating that 0 bytes were output when the buffer is full.

The pre 1.0 HardwareSerial did "block" on output waiting to put the character in the UART before it returned.
The current HardwareSerial 1.0 code has a buffer and does not "block" when there is room in it to hold the character
but does block when there is no room. The question is what should it do when the buffer is full?
The API does not describe the write() functions behavior with respect to any sort of waiting on output.
And if it now has the functionality of not waiting, should it ever block/wait or should it immediately return
with a status of zero characters output?
There is even a comment in the current 1.0 serial code questioning whether
the output routine should wait for buffer room or return 0 immediately.
(currently it spin waits forever on room - using 16 bit indexes without proper atomic accesses - but that is another issue)

My comment quoted above was stating that the current serial API documentation does not describe the existing
behavior and the new behavior is different from the old behavior which has the potential to break things
due to it changing behavior/timing and its use of additional resources.
Also, the current 1.0 implementation does not guarantee a non blocking output.
i.e. sometimes it blocks and sometimes it returns immediately.

And so the discussion was that different applications want/need different things
and the new HardwareSerial 1.0 code forces its new behaviors and reduced resources on everyone.
There is also some pondering about how many applications really need the new capabilities or could even
benefit form them vs how many will be broken by the new behaviors and reduced resources
since HardwareSerial 1.0 has no ability to be configured.

--- bill

bperrybap:
The pre 1.0 HardwareSerial did "block" on output waiting to put the character in the UART before it returned.

This is a good thing, no data could be sent otherwise... :slight_smile:

bperrybap:
The current HardwareSerial 1.0 code has a buffer and does not "block" when there is room in it to hold the character but does block when there is no room. The question is what should it do when the buffer is full?

It should block.

ie. Just like the pre 1.0 code, there's no advantage to output buffering.

What's really needed is a function "Serial.wouldBlock()" instead of all this bloat.

fungus,
we are in agreement.

It is just that in the absence of any full description of the
behavior of the write() function it isn't clear what the applications using it can expect.
Yes you can look at the current code and see that it will block until there is room
(which is probably how you want it to work), however, the API now defines
the function to return the number of characters written rather than being a void.
Since it returns the number of characters, theoretically it could be less than
what you sent, including zero.
And that is where the complication comes in.
I don't believe that it is appropriate to make assumptions about an APIs behavior
that extend beyond its guaranteed behavior in its documenation.
In this case it is silent on how blocking works, so from only looking at the API,
one might assume that it might possibly return immediately with a count of zero.
Otherwise, if it always sent all the characters, what is the point of returning the number
of characters written?

But overall I agree that buffered output is not worth the added resources
in the vast majority of the cases.

How to handle it or configure it, is not quite so clear to me other than through
added parameters in begin() or add a new status function like you mentioned
above.

--- bill

the API now defines the function to return the number of characters written

That's very strange, if it's anything except 0 or all the bytes what are you supposed to do, carve the string up and try again?


Rob

Graynomad:

the API now defines the function to return the number of characters written

That's very strange, if it's anything except 0 or all the bytes what are you supposed to do, carve the string up and try again?

Yep. And to make it more interesting, when using the Print class, you don't know which characters were dropped
It could be some from the middle.

Now while the API would allow that behavior,
currently, that won't ever happen because the HardwareSerial code blocks spin waiting on room in
the output buffer so the count from HardwareSerial write() is always 1 so the return value from
the Print class functions will always be the number of characters output with no dropped characters.

=============================================

Now in terms of how to actually let the user modify the define values in a library.
Consider this idea, you obviously can't alter the defines used in the core library
by redefining them in your sketch because the library code is compiled separately.
But you could play games with header files and include paths.
You could have hardware serial (or any other library) #include a "dummy" header file.
This header file would be empty and be in the same "cores" directory with the HardwareSerial code.
Then to override parameters, the user would create a header file by the same name in his sketch area.
He could include it but it wouldn't really matter.
The key is that the IDE would need to put the users sketch area first in the include path so that the
HardwareSerial code would include the users header file instead of the dummy one in the "cores" area.
So for example.
Lets say you had a dummy file called, core_overrides.h
All the core code would include it but the one in the "cores" are would ship as blank.
Users that wanted to modify their library permanently could put their defines in the that file,
but the real value is if users would create a core_overrides.h in their sketch area.
Then when each library module was built it would include the sketches core_overrides.h
vs the one down in the "cores" area.
As long as the core modules did things like:
#ifndef XXX
#define XXX VAL
#endif
Then the core_overrides.h could override any parameter.

Not particulary pretty but it would work and allow users the ability to tune
the core code on a sketch by sketch basis.

--- bill

Not particulary pretty...

I disagree. In my opinion, that would be a simple effective way to optionally alter the core at compile-time.

the API now defines the function to return the number of characters written ... That's very strange, if it's anything except 0 or all the bytes what are you supposed to do, carve the string up and try again?

HardwareSerial inherits from Serial and Print. The return value was added to Print.write so that the network interfaces had a way to indicate a failure. The change has nothing to do with HardwareSerial; it's just along for the ride.

I don't think you even need that. You could have a few different "Serial" classes in your library. The default one would use the least resources and some other ones could have extra buffering, called (eg) "SerialWithBufferedSendAndReceive". Now in your sketch you can do:

#define Serial SerialWithBufferedSendAndReceive

and you'll be using the other one.

nb. This assumes you want any send buffering at all. I really don't see send buffering as an advantage on a limited microcontroller like the Arduino.

If it was a big PC then I'd be like, "yeah, whatever, dude", but it isn't. It's a gadget with 2k of RAM and 32k of program. The time spent programming/discussing this could be better spent on other things.