Having memory issues with complex recursive sketch that uses regex

I'm trying to make a remote control for my camera with Arduino. It works by using the wired remote control jack and connecting the focus or the shooter pin to ground. I'm doing it with a few NPN transistors. The bases of these transistors are connected to digital pins 3 and 4 on an Arduino Mega2560 (well it's actually a funduino, it was cheaper :wink: ).
On pin 2 I connected the data pin of a simple IR sensor which I'm using with the IRremote library.
I also have a speaker connected to toneAC's hardcoded pins and a led connected to pin 5 that I light whenever I get a signal from the remote (I'm using a regular TV remote).
A picture can be found here: http://s11.postimg.org/ynpxjny9f/camera_remote.jpg (there are some notes I added in red)

This was just to give you a clue on how my project is set up. The circuit works fine, my issue is a software issue.
This is the sketch: https://gist.github.com/Davideddu/832421eca4fe12a49198

It works this way: I set a status, which can be idle, focus/shooting (for manual focus/shooting), record and play. Here's what happens in each state:

  • idle: The main loop runs over again waiting for a user input from the remote, after which some actions are taken, like activating one of the transistors, etc. One of the actions is starting a recording
  • Record: the loop still runs waiting for input, except no action is actually taken. User input is recorded to a string. When the user presses the play button, it switches to the play state
  • Play: the main loop is temporarily paused and control is given to a function, parseSubToDo. This function is supposed to parse the previously recorded input/command and take those actions.

The issue is in the last state. In order to parse the command I need to use regexp (I'm using this library). However the string parsing needs to chop the string, replicate it, etc, and as I have little experience with C (I come from Python), I don't know how to do that with C. I managed to get it working, everything works as expected, but the SRAM gets full very quickly. On the Arduino Mega it runs for a few cycles with no issues and then the function abruptly returns and the main loop continues, on the Arduino Uno it simply doesn't run (I put a Serial.println right under the function declaration and it was just skipped).

Can I have some specific tips on how to reduce the memory usage of this specific sketch? As you can see I already tried making all serial code optional and it did help (compiled size is half with DEBUG not defined), but the code doesn't run much farther.

Thank you very much for your help, I hope you have a nice day.

Don't use regex and recursion. A lot of string handling will fragment the limited RAM and cause you grief. Under no circumstances should you use recursion, which causes the stack to grow excessively.

Rewrite your parser, for a simple system like this, you should be able to get by with some simple standard C string handling and a switch/case or if-else if.

How complex is the regex?

Uh, yeah - you managed to make the parser into a real monstrosity. I think it needs to be thoroughly recoded without the heavy reliance on Strings (which should always be avoided if possible.

Also, if you have a simple string literal to print out, use F() so it doesn't waste sram. Like here:

Serial.println("First character is *, repetition.");

Hello and thanks for your answers.
However I don't see a way to avoid recursion for repetitions and for the grouped commands. See this sample command:

*50T(sd1Srd5S)

This means:

  • 50T: repeat () 50 times (50T) the following command
  • (sd1Srd5S): (parenthesis) treat this series of command as one, so that * repeats the whole thing
  • s: hold shooter down
  • d1S: 1 second of delay
  • s: release the shutter
  • d: another 5 sec delay

I don't need recursion for commands like s, d or r - they just call their respective function and done. But for * and () I can't see any other way of doing it.

The regular expressions aren't so complex: they're used to detect the number and the unit (they can be times=T (times for repeat, milliseconds for delay), seconds=S, minutes M, hours H) in delay and repeat, and to find matching parenthesis for the command grouping. I might be able to avoid them.

But what really worries me is the way I handled strings and char arrays. I have tons of them! I would have wanted to put this in a tinier microcontroller like an ATtiny, but I'll never be able to run it there like this :confused:

DrAzzy:
Uh, yeah - you managed to make the parser into a real monstrosity. I think it needs to be thoroughly recoded without the heavy reliance on Strings (which should always be avoided if possible.

Also, if you have a simple string literal to print out, use F() so it doesn't waste sram. Like here:

Serial.println("First character is *, repetition.");

I saw that in the reference, but it's not documented. What does it do exactly? Does this work too? F(String("something")+someothertype)

davideddu:
I saw that in the reference, but it's not documented.

It is

Does this work too? F(String("something")+someothertype)

No.

davideddu:
Hello and thanks for your answers.
However I don't see a way to avoid recursion for repetitions and for the grouped commands. See this sample command:

*50T(sd1Srd5S)

This means:

  • 50T: repeat () 50 times (50T) the following command
  • (sd1Srd5S): (parenthesis) treat this series of command as one, so that * repeats the whole thing
  • s: hold shooter down
  • d1S: 1 second of delay
  • s: release the shutter
  • d: another 5 sec delay

I don't need recursion for commands like s, d or r - they just call their respective function and done. But for * and () I can't see any other way of doing it.

The regular expressions aren't so complex: they're used to detect the number and the unit (they can be times=T (times for repeat, milliseconds for delay), seconds=S, minutes M, hours H) in delay and repeat, and to find matching parenthesis for the command grouping. I might be able to avoid them.

But what really worries me is the way I handled strings and char arrays. I have tons of them! I would have wanted to put this in a tinier microcontroller like an ATtiny, but I'll never be able to run it there like this :confused:

While recursion is often helpful, and sometimes even necessary (and I'm not sure it is in this case....), on a system with very limited memory it must be used with GREAT care, and you have to ensure the depth is limited, and maximum memory use is well-controlled. When using String, that can be a tall order. What you have may well work fine on a PC with virtually unlimited memory, but on an Arduino is probably not the best way to go.

The brute force solution would be to simply get an Arduino with more memory, like a Mega, or even a Due. Still, using String, if you're not careful about releasing ALL String objects as soon as they're no longer needed, you'll eventually run out of RAM, due to fragmentation.

Instead of String and RegEx, consider using strtok, and the other strxxx functions to do your parsing.

Regards,
Ray L.

Regards,
Ray L.

Ok, thank you for your tips, I'll try them later and see what happens.

I would do a multi-level parser.

The top level would handle repeats and pass the command to be repeated (or executed once!) to a function that parses it and executes. When it returns, the top level would decide whether another repeat is necessary, or return to ask for the next command.

That's exactly what I'm doing, except for the fact that the caller and the called function are the same.
I would need to reimplement the parsing in both if I would want to do so.
Recalling the same function ensures no code is repeated.

You can get around recursion with another state machine inside the first and let loop() drive the whole thing.

You want some sequence done 50 times, read the 50T and set up an index and a trigger for each step then by state, index and triggers you run once through for each pass through loop() until done when you change the state to do the next thing without blocking other tasks from running responsively.

You want to program like you have MB's of RAM to use but an UNO has 2K for EVERYTHING.

The 2nd address in my sig space below has a section showing use of a state machine to read and act on serial data as it arrives, not after it's been buffered and processed with string.h commands you probably don't need. See what Nick does, that's a good start.

Perhaps you should simplify the syntax.

Why *50T instead of just *50 ? Why use the "T"?

Also, why so many time units: seconds, minutes, and hours? Why not just pick one unit and stick to it?

Or, if you are going to use different units, use different syntax: for "delay 20 minutes", use dM20, not d20M. This will be far easier to parse.

Numbers themselves are easy to parse. Just go one digit at a time.

Examples:

Parse "8" as ('8'-'0')

Parse "35" as (('3'-'0') * 10) + ('5'-'0')

Parse "179" as ((('1'-'0') * 10) + ('7'-'0')) * 10 + ('9'-'0')

odometer:
Perhaps you should simplify the syntax.

Why *50T instead of just *50 ? Why use the "T"?

Also, why so many time units: seconds, minutes, and hours? Why not just pick one unit and stick to it?

Or, if you are going to use different units, use different syntax: for "delay 20 minutes", use dM20, not d20M. This will be far easier to parse.

Numbers themselves are easy to parse. Just go one digit at a time.

Examples:

Parse "8" as ('8'-'0')

Parse "35" as (('3'-'0') * 10) + ('5'-'0')

Parse "179" as ((('1'-'0') * 10) + ('7'-'0')) * 10 + ('9'-'0')

You want a format with identifiers, delimiters, start and end marks because there are no guarantees with data transmissions. if your data is all binary, one error ruins everything that comes after.

Nick shows a whole identify symbols and evaluate numbers method with code in the State Machine example of the How to process incoming serial data without blocking blog. It's a nice start.

GoForSmoke:
You want a format with identifiers, delimiters, start and end marks because there are no guarantees with data transmissions. if your data is all binary, one error ruins everything that comes after.

That's what checksums are for.

With a check sum you don't know if the data is good until you have buffered and checked your data.

With a decent format you can be reasonably sure it's good or not from point to point as it arrives without buffering except what unique sequences you need to hold for later output.

I'd go so far as to say leave the 'T' and parens but turn the numeric text into binary data.

The regexp library is reasonably efficient with RAM, but your use of the String type will be the killer. Rework without that. Also as suggested above, a state machine is another way of parsing simple incoming strings.

GoForSmoke:
I'd go so far as to say leave the 'T' and parens but turn the numeric text into binary data.

How then to detect the end of a number, or the end of a string?
Leaving numbers as numeric text is much easier for humans (nearly all humans, anyway, I would guess).
Besides, you know you've reached the end of a number when the next character is not a digit. And you only ever need to look ahead 1 character for that.
It might be easier to use a syntax in which numeric arguments precede the commands to which they apply. Or maybe not; I don't know.

I think you've got it! If the number precedes the action, then the action is itself the delimiter.

For example:

A001 B002 C003

You can't process the "A" command until the number ends, whenever that is. But if you have:

001A002B003C

The action code terminates the number, and tells you what to do with the number you just got. Makes a lot of sense.

odometer:
How then to detect the end of a number, or the end of a string?

That's a problem when end of text happens when binary data is expected. In that case the broken format will be detected very soon as long as data does arrive since at the least EOT won't be detected where it should either.