hi,
Sorry for the length of this post, I’m looking for some general advice on debugging not for the answer to a particular question
I’m trying to build an arduino application based on some off-the-shelf shields, a little bit of custom hardware and about 20+ class librarys, (some that I’ve written, some that I’ve found on the web and heavily modified (for example the GSM shield library from the hardware kitchen) and some that I’ve taken from the web and have used without changes (the SDfat library for example).
I’ve been working on each library in isolation and written various test scripts to test all of the functionality in each library. On their own each library (and hardware) works ok. The main problems seem to occur when I start building larger applications that require several of these librarys together.
These are the sort of bugs/errors I’m talking about 1) Application just won’t start running. 2) Code gets to a certain point and then the whole thing ‘reboots’ and starts again at the beginning 3) Character arrays get corrupted. 4) Code seems to be running ok, but the overall results doesn’t seem to be correct. 5) Code execution ‘jumps’ from where it should be running to some totally different part of the application
What tends to happen is that the errors only manifest themselves when I’ve got a large application, they’re often hard/impossible to reproduce in a test sketch for an individual library. I often find myself ‘fiddling’ with the code for a bit, the error then suddenly disappears (without me actually spotting a definitive error) then I carry on testing, only for the error (or a similar one) to re-appear somewhere else later.
I’m making the following assumptions, are they correct? 1) The ultimate cause of the errors must be because something somewhere in my application is corrupting the data memory (I assume that program memory can’t get corrupted?) 2) The errors are probably still present in my small test sketches but they don’t show up because the corruption that is occurring is only effecting un-allocated memory that isn’t being used by anything else. – it’s only as the size of the overall application builds and more memory gets used that the corruption problems starts to corrupt memory that actually being used. 3) If I know a particular function hasn’t been called before a problem occurs, then I can be sure that function hasn’t corrupted anything? 4) The most likely cause of a problem is likely to be setting an array index outside of it’s bounds i.e. temp[x]=y where x is 10 and the array only consists of 5 elements. 5) Number variables (i.e. ints, longs and floats etc) are unlikely to cause corruption because if they overflow they will ‘wrap around’ rather than corrupt other memory. 6) Any code that runs could be corrupting something, just because the error appears in function X doesn’t mean that function X is the cause.
My main de-bugging technique is to insert lots of ‘serial.print’ to see what point the processing has got to and see what’s in various variables and arrays. Is there a better way than this? One drawback seems to be that simply inserting a ‘serial.print’ seems effect the execution of the code and can cause the error to disappear or move.
In each of my test sketches I try to make the test recursive and introduce some random element to alter the flow on each iteration of the test. (for instance when testing the GSM module I generate pseudo random phone numbers and messages on each iteration) Is there a better way?
All advice gratefully received!
Mike