Go Down

Topic: EEPROM Wear Leveling (Read 8612 times) previous topic - next topic

MrAl

#30
Apr 04, 2016, 01:19 am Last Edit: Apr 04, 2016, 01:33 am by MrAl
Possibly the only times I've ever seen CRCs expose bad data in my EEPROM, was due to the power going out during a write.  The data changing on its own really isn't something you should see for the 10 year retention period, and I've yet to perform a test that long.  If you write to the EEPROM regularly, however, you are always going to have some time when you lose power in the middle of a write.

You are mostly protected with redundant copies, however.  If the power goes out while writing the 2nd copy, you still successfully wrote the 1st copy which can be validated with its CRC.  If the power goes out while writing the 1st copy, the 2nd copy still contains the data from the last time it was written.

Getting back the data from the last time you wrote isn't perfect, but at least you don't lose everything and it's the best you can do without adding some kind of capacitor backup.

When I made the code in this thread that burnt out a memory location, I actually did rerun it a few times.  Interestingly, the burnt-out memory location did pass another ~10k write/read cycles before it failed again.  Even after running the sketch several times, the burnt-out location would write/read a few times before failing again.
Hi,

A bad write due to power outage is something i have to account for in my application.  The data that is read back when power is restored must be the correct data.  I dont have any wiggle room here.

So i was thinking about this a little so far and came up with the following algorithm...
1.  First, write the pointer to locations 0x000 to 0x003 with the location of the data, low order byte.
2.  Write the first data block to locations 0x004 to 0x007, that's all i need for this but you might need more.
3.  Write a second set to locations 0x008 to 0x00B.
4.  Write a third set to locations 0x00C to 0x00F.
5.  Test each set after each write to make sure the codes are valid, if not increment the pointer by whatever is required (in this case it would be 0x00C), store the new data, then write the new pointer.

The problem here is that when we write a new pointer it could become corrupted due to power outage, so perhaps store the pointer more than once also:
Write the pointer to locations 0x000 to 0x003 and to 0x004 to 0x007 and to 0x008 to 0x00B.
Write the data to locations 0x00C to 0x00F, 0x010 to 0x013, 0x014 to 0x017.
If any of the pointers are not the same use one of the two sets that are the same.
If any of the data is not the same, use one of the two sets that are the same.
If any of the pointers do not agree, rewrite that pointer.
If any of the data tests fail, increment the pointer then write the new data then write the three new pointers.

Still thinking about this, but with an algorithm like this it looks like there is only one location that will be bad if the power goes out, and with three copies that means that two will always be intact.
This would lead to about 12.5 years of operation when storing the data once per minute and running 24 hours a day 7 days a week.  My app will run much less than that so it should outlive the chip.

The primary problem is the data could be corrupted if the power goes out when writing the data, and that leads to either a CRC code or redundant data.  I like the redundant data idea because it does not rely on probability.  The secondary problem is that the pointer could be corrupted so that's the same deal, and a second and third copy seems to solve the problem exactly.

Now the only thing that is not known for sure yet is how the data on the EEPROM is physically addressed.  If it addressed in groups like 0x000 to 0x003, 0x004 to 0x007, etc., then this should work well.  If it is addressed like 0x000, 0x100, 0x200, 0x300 then it's not the best yet.  We'd have to find that out somehow.

LATER
Thought about this some more, and realize that when the first data group is written that changes one set already, so if the next set gets corrupted that means all three sets might be different.  In this case maybe resort to using the last set written (set 3).  If the first set write gets corrupted that's no problem because set 2 and 3 will be the same, and if set 3 gets corrupted that's no problem either because set 1 and 2 will be the same.  When set 2 gets corrupted though that is when set 1 and set 2 will not be the same as set 3, so set 3 must be right.  This still deserves more thought though, as set 3 will then be the slightly older set of data.



Smajdalf

@MrAI
You are trying to make it too complicated:
Make a "validating" byte something like 7 bit CRC and 0 at LSB. When updating data you erase the byte first marking the other data invalid. After you write and check the new data you write a new validating byte. This way there is no way to have corrupted data marked as valid.
When you want to implement some wear leveling and have just this one data to write you can make it easy. You write the data to next address after that of the last valid data. When restarting you read the data from beginning of EEPROM and last valid data are the most fresh one.

Go Up