Go Down

Topic: General advice on de-bugging (Read 788 times) previous topic - next topic

Lima7

hi,

Sorry for the length of this post, I'm looking for some general advice on debugging not for the answer to a particular question

I'm trying to build an arduino application based on some off-the-shelf shields, a little bit of custom hardware and about 20+ class librarys, (some that I've written, some that I've found on the web and heavily modified (for example the GSM shield library from the hardware kitchen) and some that I've taken from the web and have used without changes (the SDfat library for example).

I've been working on each library in isolation and written various test scripts to test all of the functionality in each library. On their own each library (and hardware) works ok. The main problems seem to occur when I start building larger applications that require several of these librarys together.

These are the sort of bugs/errors I'm talking about
1)   Application just won't start running.
2)   Code gets to a certain point and then the whole thing 'reboots' and starts again at the beginning
3)   Character arrays get corrupted.
4)   Code seems to be running ok, but the overall results doesn't seem to be correct.
5)   Code execution 'jumps' from where it should be running to some totally different part of the application

What tends to happen is that the errors only manifest themselves when I've got a large application, they're often hard/impossible to reproduce in a test sketch for an individual library. I often find myself 'fiddling' with the code for a bit, the error then suddenly disappears (without me actually spotting a definitive error) then I carry on testing, only for the error (or a similar one) to re-appear somewhere else later.

I'm making the following assumptions, are they correct?
1)   The ultimate cause of the errors must be because something somewhere in my application is corrupting the data memory (I assume that program memory can't get corrupted?)
2)   The errors are probably still present in my small test sketches but they don't show up because the corruption that is occurring is only effecting un-allocated memory that isn't being used by anything else. - it's only as the size of the overall application builds and more memory gets used that the corruption problems starts to corrupt memory that actually being used.
3)   If I know a particular function hasn't been called before a problem occurs, then I can be sure that function hasn't corrupted anything?
4)   The most likely cause of a problem is likely to be setting an array index outside of it's bounds i.e. temp
  • =y where x is 10 and the array only consists of 5 elements.
    5)   Number variables (i.e. ints, longs and floats etc) are unlikely to cause corruption because if they overflow they will 'wrap around' rather than corrupt other memory.
    6)   Any code that runs could be corrupting something, just because the error appears in function X doesn't mean that function X is the cause.

    My main de-bugging technique is to insert lots of 'serial.print' to see what point the processing has got to and see what's in various variables and arrays. Is there a better way than this? One drawback seems to be that simply inserting a 'serial.print' seems effect the execution of the code and can cause the error to disappear or move.

    In each of my test sketches I try to make the test recursive and introduce some random element to alter the flow on each iteration of the test. (for instance when testing the GSM module I generate pseudo random phone numbers and messages on each iteration) Is there a better way?

    All advice gratefully received!

    Mike

pocketscience

Quote
I'm making the following assumptions, are they correct?
1)   The ultimate cause of the errors must be because something somewhere in my application is corrupting the data memory (I assume that program memory can't get corrupted?)
2)   The errors are probably still present in my small test sketches but they don't show up because the corruption that is occurring is only effecting un-allocated memory that isn't being used by anything else. - it's only as the size of the overall application builds and more memory gets used that the corruption problems starts to corrupt memory that actually being used.
3)   If I know a particular function hasn't been called before a problem occurs, then I can be sure that function hasn't corrupted anything?
4)   The most likely cause of a problem is likely to be setting an array index outside of it's bounds i.e. temp
=y where x is 10 and the array only consists of 5 elements.
5)   Number variables (i.e. ints, longs and floats etc) are unlikely to cause corruption because if they overflow they will 'wrap around' rather than corrupt other memory.
6)   Any code that runs could be corrupting something, just because the error appears in function X doesn't mean that function X is the cause.


1. That's a pretty good starting point. From experience the most prevalent cause of crashes is corruption of memory, whether that be local stack-based data, or global objects. If you're using pointers to structures make sure you've actually allocated memory for the structure!
2. Most likely yes - but it might be prompted by code from another library trashing memory.
3. Almost certainly if a function hasn't been called *at all* then you should be pretty comfortable it's not causing a crash *preceding* it being called. It may still have errors in it though!
4. Yup, "out by 1" errors with respect to array indexes are very very common. You need to carefully check the parameters passed to functions that modify memory as well - it's real easy to screw up a call to memset for example and blast past the end of allocated memory.
5. Correct, however like with 4 it's easy to inadvertently pass incorrect parameters to a function that modifies memory and have a basic type like char written to as if it were a long.
6. Unfortunately yes.

If you're able to bring the libraries in one at a time it might help pin-point the problem. Start with dummy functions then add the real code in slowly.

Hope that helps a bit, unfortunately debugging on Arduino is very very rudimentary. A nice source-level debugger would be great (maybe one exists? I've not explored that at all).


G.
Is life really that serious...??!

olikraus

A stack overflow might be one cause of your problems.

Mini-Tutorial: How to check memory consumption of a sketch

1. Locate .elf file of your sketch
Example: On my machine (Name is "one", Ubuntu Linux) the elf file of my Chess sketch is located here:
one:/tmp/build8228077309561956444.tmp$ ls *.elf
Chess.cpp.elf

2. Use avr-size
Example:
one:/tmp/build8228077309561956444.tmp$ avr-size Chess.cpp.elf
   text      data       bss       dec       hex   filename
  11668       162       386     12216      2fb8   Chess.cpp.elf

3. Check Flash-ROM size
The flash ROM size of "text"+"data" +  bootloader which must be lower than 32K for the Uno with ATMEGA328

4. Check RAM size
The required RAM is "data"+"bss"+stack. This sum must be lower than 2K for the Uno with ATMEGA328.
One problem is the stack size, because the number of bytes for the stack is hard to calculate, especially for unknown libraries. But if "data"+"bss" is already near or beyond 2K, than your program will crash.

Oliver

PaulS

I'm printing this out for future reference. Thanks.

pocketscience

Ahh yes, all those avr tools do come in handy. For Mac users they are buried deep inside the Arduino app package:

/Applications/Arduino.app/Contents/Resources/Java/hardware/tools/avr


G.
Is life really that serious...??!

Go Up