Why can't I regress to an older version of my project?

Hi, this is related to another post on this forum Is there a 128k limit on Mega 2560? - Programming Questions - Arduino Forum

Here's the edited, highlights....
My project has been growing over the months, (both in functionality and size), over the last couple of weeks it's slowly expanded to the point where it now it compiles to just over 128 kBytes on a Mega 2560 (not sure if this is relevant)

During this period I noticed what started as an occasional bug, the code would compile and upload as normal without error. But when run I would just get a seemingly random string of about 100 numerals and then nothing further. (exactly the same string every time -see the above post for details)

Initially I took this as an occasional glitch, and discovered that I could get around it simply by compiling and uploading an unrelated sketch onto the board instead, and then re-compiling and loading my project (sometimes I would have to repeat this process several times to get it to work). This was initially a very occasional error, but as time progressed (and the project got bigger) the problem was occurring more frequently, and my work-around was having to be run more times to get the project to work.

I've made a few more changes to the code, and seem to have reached the point where the error always occurs, and the work-around never works!

ok, now for the really annoying bit. Every time I've built the project I've backed up all my libraries (all my .cpp and .h files) , so getting it working again ought to be just be a matter of restoring the last working set of files and re-compiling. However, for reasons that baffle me, I've had to go back 14 versions (nearly 3 weeks ago) to find a version that will work.

So, ignoring the actual problem with the project, my question is - why won't code that worked yesterday work today????

So far my research has suggested that the problem may be down to where in memory the compiler chooses to put objects (in particular PROGMEM strings) - (does this sound plausible? please comment) I've also learnt that a compiler won't always put the same object in the same location (does anyone know why? I thought a computer working on exactly the same data should do exactly the same thing every time)

Bearing this in mind, my current conclusion is that for the last few weeks the compiler has (through chance only) being putting objects in the right place, and has recently started putting objects in the wrong place (again through luck only). does this sound plausible? I'll admit I'm just guessing.

The following facts may also be relevant...
I'm compiling to a Mega 2560 using version 1.0
There are NO compilation errors - this is just about run-time problems.
All the individual libraries compile and run without problems. I only get a problem when I build the whole project and the sketch compiles to around 128k
There's not been one big change (I haven't even added any new libraries) I've just been adding a few lines every day over the last few weeks.
The project includes about 800 progmem constants totaling about 28k.
I get the same error on three different Mega 2560s
I've tried loading the progmem constants into higher memory using ''attribute((section(".fini7")))" instead of "PROGMEM" but this made no difference.

all comments appreciated!

Rebuilding the same set of source files with the same IDE should produce the same executable image. If the behaviour of that is changing, I wonder whether this is due to some external influence. Without any idea of the hardware or software involved I have no idea what possible influences there might be, but it could be things like PSUs running hotter in warm weather, RTC or GPS generating different date-related data, the fact you're working at a different time of day, that sort of thing. I remember many years ago trying to track down a stability problem in a large application that we eventually - after an immense amount of guesswork and head scratching and frustration - realized could only be reproduced when a certain scenario happened on a specific day of the year. It was a simple problem but for the longest time we didn't consider the one critical factor needed to reproduce it. If the behaviour you're seeing defies reason then it suggests that there's some factor influencing it that hasn't been considered yet.

This is unlikely to be the source of your problem, however I recommend you thoroughly test the memory on your host machine. (Look into memtest86.) That is one possible variable that needs to eliminated.

Is it possible to do a binary compare on .hex files? That could help sort out which side of the USB this error lives on.

Under normal operation you don't get "intermittent" or "random" glitches in microcontroller programs. By "normal" I mean:

  1. The microcontroller isn't damaged
  2. The circuitry around the microcontroller is correct (power supply, decoupling, etc).

The symptoms you describe sound very much like memory corruption / overflows caused by bad programming.

Excessive use of the String object, or bad use of char[] strings is the most common.

Thanks majenko,

The symptoms you describe sound very much like memory corruption / overflows caused by bad programming.

under normal circumstances I'd say you're right, but I should still be able to regress to a previous version - thats the baffling thing for me!

Excessive use of the String object Everyone seems to advise against this, so I've always used char[] instead.

thanks for the response

I suspect that the bug has been there since at least 14 revisions ago.

If it's a char[] overflow fault it could be hanging around there for ages without you noticing it because it just so happens that the heap or stack gets arranged in such a way that it the fault doesn't manifest itself. Yes, the code may have compiled in exactly the same way, but your code may not have executed in exactly the same way if it's being influenced by external sensors / inputs. That could cause the stack of the heap to get arranged differently and thus the fault appears.

I suggest going over your code with a fine toothed comb looking for faults with your char arrays.

Some common things to watch out for:

  1. Arrays are zero based, so an array of [10] has indices 0-9 not 1-10
  2. You must allocate one more character than you have characters in your string to accommodate the terminating null character ('\0')
  3. When constructing strings ensure that you always put that null character in place

Some things to help you:

  1. Use the "n" version of string functions to limit the number of characters you operate on so they never go out of bounds. snprintf(), strncpy(), etc.
  2. Erase the contents of your strings before putting anything in them. This ensures that there will always be a null character as the string is all null characters already. See bzero() or memset().

OhMyCod:
Everyone seems to advise against this, so I've always used char[] instead.

Yes - but are you properly null-terminating them?

Thanks for the tips guys, I've been going through my character strings checking for this sort of thing - I've been coding in c++ for long enough to aware of the problems, but not long enough to avoid them!

Incidentally, I'm also aware that when using the 'n' versions of the str commands I need to remember to manually null terminate them.

There is I think some evidence against this being a char[] overflow problem, The very first thing my project does after loading my libraries is to display the message 'starting....' to the console, but it doesn't even get that far. For this to be an overflow issue with my code, the offending bit of code (if there is one) must actually run, the only thing that actually happens before this is that my constructors get called, but none of these actually do anything.

I'll carry on digging, but if any of you have any suggestions, please don't be shy!

You also have to consider how much statically allocated RAM you are using. All variables that aren't local to a function, or dynamically allocated, will be initialized at startup by the startup code (crt.o). I think this also includes what you would normally consider string literals, which IIRC get cast into a String object for printing to Serial etc. Avoid this by forcing them into flash with the F("...") macro.

If there is too much data being initialized it'll collide with the stack and overwrite important system information killing your program.

OhMyCod:
the only thing that actually happens before this is that my constructors get called

Constructors for objects created in which part of memory? Where do the libraries you mention come from - have they been evolving? Have you ever measured the available memory when your sketch runs to see whether you're in danger of running out?

Constructors for objects created in which part of memory?

Won't all object be created in dynamic memory (at least as long if they're not const?)

Where do the libraries you mention come from - have they been evolving?

All written by myself, most have been evolving slowly over the last few months.

Have you ever measured the available memory when your sketch runs to see whether you're in danger of running out?

this is the function I've been using to check how much memory is free, it indicates about 1400 bytes free after all my objects are created....

// this function will return the number of bytes currently free in RAM
// written by David A. Mellis
// based on code by Rob Faludi http://www.faludi.com
int freeRam() 
{
  //int size = 1024; // Use 1034 with ATmega328
  //int size = 2048;  // Use 2048 with ATmega328 (Duemilanove)
  int size = 8192;  // Use 8192 with ATmega328
  byte *buf;

  while ((buf = (byte *) malloc(--size)) == NULL);

  free(buf);

  return size;
}

I've no idea if it's any good or if the figures it generates are accurate, if you've got some alternative code, please give!

thanks for all your suggestions

OhMyCod:

Constructors for objects created in which part of memory?

Won't all object be created in dynamic memory (at least as long if they're not const?)

No, depending on how you define "dynamic memory".

What do your constructors do? Hopefully, not much.

http://www.parashift.com/c++-faq/static-init-order.html

OhMyCod:
Won't all object be created in dynamic memory (at least as long if they're not const?)

They will all be created in memory but if something is going wrong during construction/initialisation it might matter whether they're being created in the global data section, on the stack or in memory allocated from the heap.

OhMyCod:
All written by myself, most have been evolving slowly over the last few months.

Have you regressed all of the libraries too?

Have you regressed all of the libraries too?

Yes, otherwise the project itself would fail.

What do your constructors do? Hopefully, not much.

Generally, nothing. In one or two case they intialize basic member variables, I read somewhere you shouldn't put much actual code in them although I'll admit I'm not sure why.

Initializing low-level variables (like int, long) should be OK. Doing something more (eg. initializing a String variable) could fail because the String library might not be initialized yet. Hence my link to the "static initialization order fiasco".