EEPROM memory-stomping bug?

I don't have the code with me to post just now, but I'm working on a system with an Arudino Uno, which uses EEPROM to store some data. I'm using ArduinoUnit to code it up using Test Driven Design techniques. While doing this, I've run across what appears to be an EEPROM bug: I create some data structures in RAM, and as soon as an EEPROM.read() or EEPROM.write() call occurs, subsequent tests on those data structures fail. I can move the EEPROM calls around in the test list, and tests reliably start failing as soon as EEPROM is accessed.

I can post code and exact version numbers of everything when I get home again, but in the mean time, does any one know whether the EEPROM code depends solely on Arduino IDE code, or if it requires system code? I'm running the IDE on Ubuntu, and didn't see problems like this the last time I used the IDE, with version 0022. Obviously both Ubuntu and the IDE have changed a lot since those days.

I ask because I traced back through the EEPROM code, and it links to to /usr/lib/avr libraries, which I suspect are maintained by Ubuntu as opposed to by Arduino.

I need to do more testing before I can categorically state that this is an Arduino bug (as opposed to a flaw in my board, or a flaw in Ubuntu), but I figured I'd ask and see if anyone can help me out in the mean time. Thoughts?

EEPROM has only a (very) limited # of write cycles ,
don't know the exact number but I can imagine that a test can reach this number within an hour or
so ruining at least that address ... ruining your test ... etc?

Can you post some code?

According to the 328 spec (on the data sheet: http://www.atmel.com/dyn/resources/prod_documents/8271S.pdf), the Flash is rated for 10,000 read/write cycles, and the EEPROM for 100,000 (or 10x more than Flash). I agree one should be careful about excessive access to memory, but my ~2 dozen test runs with one read/write cycle each aren't a concern. It's a good warning to keep in mind, though.

I can post code in about 8 hours, and hopefully I can narrow it down to a very limited set of tests so the code sample isn't too huge. :wink:

Here's the code that's failing for me. I've reduced it to just the system which is failing and will be working soon on using different boards and computers to explore the limits of the bug.

btt_main.ino (163 Bytes)

scheduler.h (1.75 KB)

scheduler_test.ino (3.32 KB)

scheduler.ino (1.15 KB)

Indeed, tested the exact same code using 1.0.3 on a Mac, and no bug. I think there's got to be a bug in the Ubuntu AVR libraries. I'd be interested to hear from other people running Ubuntu 12.04 LTS whether they also see the problem. The bug output is reproduced in the comments of scheduler_test.ino.

A quick look in the code doesn't show anything that tickles "my bug-sense"

I found the problem: I was (foolishly) depending upon Ubuntu's packages being up to date. On the advice of a friend, I removed the Ubuntu Arduino package and downloaded the 1.0.3 tar file from arduino.cc. Recompiled and tested using 1.0.3, and the problem has disappeared. I was apparently running into a bug in 1.0 that was fixed by the time 1.0.3 was released. I tried creating a more-focused test to duplicate the issue in less code, but didn't see the failure there (using 1.0) -- the code I uploaded must have hit a pretty specific problem condition.

Thanks for taking the time to review my code, Rob, I appreciate it! I normally work in higher-level languages, so I'm always suspicious that my pointer-fu is causing me problems; the only time I end up using them is when I work on one of my infrequent Arduino projects. It's good to have other eyes on my code and make sure I'm not making silly mistakes.

You're welcome!

It's back! :~ It cropped up in a different place, but this time I was able to reproduce it on two different computers (one Ubuntu, and one Mac, both running 1.0.3). I'll be testing this weekend with a different Arduino board to confirm it's not the board (it's very unlikely to be the board), but the attached code reliably fails on the get_sched_with_too_small_addr_returns_empty_sched test. First, the error output:

Running test suite...
Equality assertion failed in 'get_sched_with_too_small_addr_returns_empty_sched' on line 28: expected '0' but was '1239'
Tests run: 15 Successful: 14 Failed: 1

I added some commentary in the eeprom.ino file at the bottom, which is the failing function. The execution isn't even hitting the code that's causing the problem (that call to EEPROM.read(100)), but when it's in the file, the problem crops up. When I take it out, or put something else there (keep in mind it's not being executed as part of the failing test), the test succeeds. There's something special about adding any kind of call to EEPROM that's causing the problem.

I tried using EEPROMEx, but it had the exact same issue.

Interestingly, the actual value reported (1239 in this case) changes between machines and compiles as I add or remove code, but I presume that's due to the changing set of char arrays ArduinoUnit is installing. More interestingly, this test case fails whether or not the control_hardware* files from the do_not_compile directory are included in the build or not.

I would be interested to know whether anyone else is having the same problem using this code and 1.0.3. To compile and test, you'll need the current version of ArduinoUnit: Google Code Archive - Long-term storage for Google Code Project Hosting.. Compile, upload and start the serial monitor at 9600 baud: you should see the error message I listed above.

btt.tar.gz (5.81 KB)

I finally found the problem. I was doing this:

struct sched * new_sched()
{
    struct sched this_sched;
    /* assign zeroes to the structure elements */
    return(&this_sched);
}

Sometimes, this worked. Eventually it failed. Moving to this model made things work as expected:

struct sched * new_sched()
{
    struct sched this_sched;
    this_sched = (struct sched *)malloc(sizeof(sched));
    /* assign zeroes to the structure elements */
    return(this_sched);
}

Essentially, I was allocating a pointer to a struct, but not requesting the memory for it. I then assigned values to it, and returned the pointer. This was a pointer into a stack or something, some part of memory that was destined to be overwritten at some point. Once I built up enough tests, it was overwritten, and that pointer was pointing to garbage. Adding the malloc() call means that I was protecting that hunk of memory and it wasn't being overwritten any more.

Are you freeing that pointer, later?

Using pointers and malloc on a system with so little memory is not really a good idea. Using references is a better idea, generally. Pass a reference to the function, telling it where to store the data. The caller can than create (statically) an instance of the structure for the function to write to.

Yes, religiously free()ing. :wink: Good point about passing static vars by reference, I'll look into refactoring in that direction (it would work just as well as what I'm doing with malloc()).