Hi guys,
Like several before me, I’ve come up against the problem of unexpected resets when using Arduino. I have read extensively on the forum and elsewhere to try and get to the bottom of this without success, and there is a history of threads that tail off without any obvious satisfactory conclusion. But there is a pattern emerging, that points towards a library problem, and IMHO it needs to be investigated thoroughly. I’m an experienced IT person, and I’d be happy to play a part, but my C skills are not up to doing it alone. Perhaps a God member can suggest the right course of action.
For completeness I give you the following background details, but as you will see later, I think these are largely irrelevant, since the same symptoms appear in a wide variety of situations (and no, it doesn’t seems to be a lack of memory problem).
I have a recent bog standard Uno with 2k of SRAM. I am writing software to control multiple heating elements on (eventually) more than one kiln, using a shield by Ocean Controls to multiplex readings from several thermocouples. The Arduino will read the temperatures and control the power to the elements by cycling a set of solid state relays – a simple project, but critical since failure could involve the fire department.
On the software side, I am writing a task scheduler to handle the various kiln control tasks, other tasks for communicating with the user (me) linked via the USB to my PC, and potentially other PIN related tasks. I’m using Excel with VBA on the PC side, but I am deliberately keeping Excel at arms length from the comms, and have found Gobetwino to be a suitable intermediary.
My code is currently 12.5kB and includes:
#include <SPI.h>
#include <avr/pgmspace.h>
#include <EEPROM.h>
Recently I started getting odd print outs on the serial monitor. (I always assume it is me. In 40 years I have often sworn at the makers of this or that language or utility only to find out later it was my own fault, and of course that may still be the case. But this time I have considered it from every angle and maybe it’s a genuine problem with the infrastructure – hardware or libraries. )
The first thing of course was to pepper it with diagnostic statements. But the weird thing was that the diagnostic statements themselves were getting corrupted. Also the injection of a diagnostic statement (or any other statement) in one place would sometimes fix the problem where it occurred, only for it to manifest itself in some other form elsewhere. In short the problem did not seem to be related either to the bit of code where the Serial.prints started going bizarre, or where the system eventually froze.
The forum threads were full of references to running out of SRAM. I transferred my biggest array to EEPROM. I then incorporated code from the forum (many thanks) to test for free memory, and all is well – at the point where the bizarre stuff started, there were 986 Bytes free, (Data segment was 374, Bss 425, Heap 96, and Stack 166). But of course, true to form, when I put in this memory usage code, the problem shifted!
I don’t use maloc. I’m not doing anything sneaky at all.
Next was a thorough look at the use of arrays and pointers within the programme. Any statements that looked even remotely risky were surrounded by diagnostics to reveal out of range subscripts, but nothing turned up. (As a C newbie I remain a little nervous here!).
Next was a more detailed examination of the evidence on the screen. I noticed that fixed strings were being misrepresented on the screen – a Serial.print of “KILN” that was correctly printed early in the session would eventually become “**LN” where ** could be anything, but the ** would usually be the same on the next iteration through the loop, or become ***. A look at the C technical documentation, (and confirmed using the free memory functions), shows that fixed strings are stored in the Data area of memory, well away from the Stack, so if there is genuine overwriting of fixed strings going on, it is pretty catastrophic. Perhaps a copy is put on the stack prior to printing, and it is the copy that is corrupted? Whatever.
Shortly after the first of the screen corruptions appears, the system will either hang or perform a reset.
I suspected Gobetwino, but it fails using the Serial Monitor alone (sorry Gobetwino)
I suspected the shield, but it persists when the shield is removed.(sorry Ocean)
I then read every thread I could find talking about restarts, and that’s when I became convinced of a deeper issue. The configurations reported involved several varieties of Arduino, different types of shield, and different libraries included. There were however several things in common to two or more tales:
- Timing of the restart - several tales involved a delay of about 20 seconds or so before the system resets
- the use in general of the serial port
- size of the code – largish in some cases like mine
- the movability of the problem when lines of code are added/subtracted
a. “I can resolve this by altering a line that is in itself correct and has worked before enabling the Ethernet functionality. After rewriting a bit the unit restarts at a later point in time.”
b. “Altering the code makes the error disappear or at least change” - crashes even when there is little load on the comms
and on the human front, a whole heap of frustration and wasted time going down blind alleys. Several users have found work arounds and given up trying to solve their problems, others I’m sure have simply quit the arena, which is a shame for Arduino.
The clincher for me that it might be library related was the report by “cshotton” who described a set of symptoms very similar indeed to mine (on a very different configuration),
Whenever I call (lcd.printAt(0,1,"text") the display shows some random digits in stead of 'text'.)
and who eventually discovered that the problems went away when he stopped using Serial commands completely. Unfortunately I cant do this since I rely on the Serial monitor to see what’s going on.
It is difficult to point the finger at Serial, since most people will be using it, and the obvious question will be; if this is where the problem lies, why aren’t more people finding it and reporting it??? It’s a bit like finally having to suspect your mother of stealing cookies from the cookie jar – it doesn’t seem right somehow - it is too awful to contemplate a problem so close to the heart of the Arduino project.
Perhaps it is a combination thing: using Serial in a largish program. Certainly it came about for me and one other person as we “expanded the code”. Perhaps the linker gets it wrong. Perhaps none of these, but it needs running to ground, or there will be a succession of people giving up in frustration, most of whom we’ll never hear about.
Can I interest some God member to take this on and sort it once and for all. If it turns out to be my silly indexing bug, I’ll buy you dinner.
Happy to send all the code to anyone who wants to pursue this.
PS absolutely LOVE the Arduino!
Kenny