AVR (328P) EEPROM puzzler

See: http://lists.opentrv.org.uk/pipermail/opentrv-dev/2016-March/001974.html

Hi,

With a large batch of OpenTRV (Arduino-like, 1MHz CPU clock, 2xAA NiMH cell supply) hardware are seeing some bizarre behaviour where when we try to write a 16-byte key into EEPROM it fails to ‘stick’ properly the first time. If the board is power-cycled and the key is written again, it appears to stick as expected. (There’s some subtle sub-wierdnesses also.) This uses some ‘careful’ update routines that we use extensively elsewhere and seem to be fine with blocks as large as 8 bytes long (node IDs).

Lots of use of my favourite search engine is failing to show any evidence of this being a common problem.

I have reviewed the code repeatedly and found a minor wrinkle which is merely an inefficiency, not an error.

Anyone seen anything like this before?

The affected routines are:

setPrimaryBuilding16ByteSecretKey() getPrimaryBuilding16ByteSecretKey()

here:

https://github.com/opentrv/OTRadioLink/blob/master/content/OTRadioLink/utility/OTV0P2BASE_Security.cpp

Obviously there's a lot of code behind that, but the point is that EEPROM has been working well up and as expected until this point, and this failure mode is bizarre and worrying. Does the EEPROM logic flake out if you try to write more than about 8 bytes in quick succession? Do I need to inject pauses or sleeps? What did I miss in the data sheet?

Rgds

Damon

DamonHD: See: http://lists.opentrv.org.uk/pipermail/opentrv-dev/2016-March/001974.html

Hi,

With a large batch of OpenTRV (Arduino-like, 1MHz CPU clock, 2xAA NiMH cell supply) hardware are seeing some bizarre behaviour where when we try to write a 16-byte key into EEPROM it fails to ‘stick’ properly the first time. If the board is power-cycled and the key is written again, it appears to stick as expected. (There’s some subtle sub-wierdnesses also.) This uses some ‘careful’ update routines that we use extensively elsewhere and seem to be fine with blocks as large as 8 bytes long (node IDs).

Lots of use of my favourite search engine is failing to show any evidence of this being a common problem.

I have reviewed the code repeatedly and found a minor wrinkle which is merely an inefficiency, not an error.

Anyone seen anything like this before?

The affected routines are:

setPrimaryBuilding16ByteSecretKey() getPrimaryBuilding16ByteSecretKey()

here:

https://github.com/opentrv/OTRadioLink/blob/master/content/OTRadioLink/utility/OTV0P2BASE_Security.cpp

Obviously there's a lot of code behind that, but the point is that EEPROM has been working well up and as expected until this point, and this failure mode is bizarre and worrying. Does the EEPROM logic flake out if you try to write more than about 8 bytes in quick succession? Do I need to inject pauses or sleeps? What did I miss in the data sheet?

Rgds

Damon

Damon, Why are you not using the built in AVR update routines?

void eeprom_update_type(uint8_t *__p, uint8_t __value);

They only erase/write if necessary.

Are you directly access the EEPROM hardware (registers) or are you using the AVR libraries?

If you are directly accessing the control registers, you must check the eeprom ready flag:

bool bit_is_clear(NVM_STATUS, NVM_NVMBUSY_bp); // for AVR_XMEGA
//
//better to use
bool eeprom_is_ready(); // from avr/eeprom.h

edit: added: |500x384

Are you processing interrupts while this Write is in progress?

See the Caution: paragraph.

chuck.

Hi,

Thanks for the ideas!

When I started the code I don't think some of those routines existed!

Also, we have some very careful update routines which do only the erase or only the write part, as required, to minimise wear dramatically.

So, yes, I am accessing the registers, but I believe I am doing so correctly (I will go and double check all the interrupt lockouts against the data sheets again) and the routines seem to work fine in all but this case. Note that EEPROM is never touched in ISRs.

The implementation is here:

https://github.com/opentrv/OTRadioLink/blob/master/content/OTRadioLink/utility/OTV0P2BASE_EEPROM.cpp eg for eeprom_smart_update_byte()

Rgds

Damon

I've now had one report of a shorter (8-byte) write failing to 'stick' entirely, with a failure happening somewhere in the first 4 bytes.

Do others put any sort of pause/sleep/delay between EEPROM byte writes?

Rgds

Damon

When I traced the disassembly of the eeprom write function in avr-libc in this thread, all they were doing was waiting for the EEPE to clear at instruction 0x9dc. This is equivalent to using eeprom_is_ready() from the avr-libc routines.

It seems the only thing your routines could be doing wrong, is not waiting on the EEPR bit which takes 1.7ms or 3.4ms depending upon the value of EEPM when you last modified the EEPROM. As a general rule, you should check the avr-libc library before you write this sort of thing yourself →
http://nongnu.org/avr-libc/user-manual/group__avr__eeprom.html

Here is test code that shows how the eeprom update functions work in avr-libc:

#include "Arduino.h"
#include "LT_printf.h"

#define EEPROM_DATA_EEPROM_SIZE 1024

void setup() {
  // put your setup code here, to run once:
  Serial.begin(115200);
  EEPROM_Data_Test();
} //end setup()

void loop() {

}

void EEPROM_Data_Test(void)
{
  uint16_t address;
  uint8_t dump_row[16];
  uint8_t idx;
  uint32_t calc_time;

  LT_printf(F("Writing over blank EEPROM with Data\r\n"));
  address = 0;
  calc_time = micros();
  while (address < EEPROM_DATA_EEPROM_SIZE)
  {
    memset(dump_row, address & 0xFF, sizeof(dump_row));
    eeprom_update_block(dump_row, (void*)address, sizeof(dump_row));
    address += sizeof(dump_row);
  }
  calc_time = micros() - calc_time; // Calculate the time it took for function.
  LT_printf(F("Wrote %u bytes in %lu us for a time of %l.3u ms/btye\r\n"), EEPROM_DATA_EEPROM_SIZE, calc_time, (calc_time + (EEPROM_DATA_EEPROM_SIZE >> 1)) / EEPROM_DATA_EEPROM_SIZE);
  //=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
  LT_printf(F("Writing over non-blank EEPROM with the Same Data\r\n"));
  address = 0;
  calc_time = micros();
  while (address < EEPROM_DATA_EEPROM_SIZE)
  {
    memset(dump_row, address & 0xFF, sizeof(dump_row));
    eeprom_update_block(dump_row, (void*)address, sizeof(dump_row));
    address += sizeof(dump_row);
  }
  calc_time = micros() - calc_time; // Calculate the time it took for function.
  LT_printf(F("Wrote %u bytes in %lu us for a time of %l.3u ms/btye\r\n"), EEPROM_DATA_EEPROM_SIZE, calc_time, (calc_time + (EEPROM_DATA_EEPROM_SIZE >> 1)) / EEPROM_DATA_EEPROM_SIZE);
  //=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
  LT_printf(F("Writing over non-blank EEPROM with Different Data\r\n"));
  address = 0;
  calc_time = micros();
  while (address < EEPROM_DATA_EEPROM_SIZE)
  {
    memset(dump_row, (address + 1) & 0xFF, sizeof(dump_row));
    eeprom_update_block(dump_row, (void*)address, sizeof(dump_row));
    address += sizeof(dump_row);
  }
  calc_time = micros() - calc_time; // Calculate the time it took for function.
  LT_printf(F("Wrote %u bytes in %lu us for a time of %l.3u ms/btye\r\n"), EEPROM_DATA_EEPROM_SIZE, calc_time, (calc_time + (EEPROM_DATA_EEPROM_SIZE >> 1)) / EEPROM_DATA_EEPROM_SIZE);
  //=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
  LT_printf(F("Starting EEPROM Erase\r\n"));
  address = 0;
  calc_time = micros();
  while (address < EEPROM_DATA_EEPROM_SIZE)
  {
    memset(dump_row, 0xFF, sizeof(dump_row));
    eeprom_update_block(dump_row, (void*)address, sizeof(dump_row));
    address += sizeof(dump_row);
  }
  calc_time = micros() - calc_time; // Calculate the time it took for function.
  LT_printf(F("Erased %u bytes in %lu us for a time of %l.3u ms/btye\r\n"), EEPROM_DATA_EEPROM_SIZE, calc_time, (calc_time + (EEPROM_DATA_EEPROM_SIZE >> 1)) / EEPROM_DATA_EEPROM_SIZE);
  //=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
  LT_printf(F("Starting EEPROM Erase\r\n"));
  address = 0;
  calc_time = micros();
  while (address < EEPROM_DATA_EEPROM_SIZE)
  {
    memset(dump_row, 0xFF, sizeof(dump_row));
    eeprom_update_block(dump_row, (void*)address, sizeof(dump_row));
    address += sizeof(dump_row);
  }
  calc_time = micros() - calc_time; // Calculate the time it took for function.
  LT_printf(F("Erased %u bytes in %lu us for a time of %l.3u ms/btye\r\n"), EEPROM_DATA_EEPROM_SIZE, calc_time, (calc_time + (EEPROM_DATA_EEPROM_SIZE >> 1)) / EEPROM_DATA_EEPROM_SIZE);
}

These are the results:

Writing over blank EEPROM with Data
Wrote 1024 bytes in 3486648 us for a time of 3.405 ms/btye
Writing over non-blank EEPROM with the Same Data
Wrote 1024 bytes in 2776 us for a time of 0.003 ms/btye
Writing over non-blank EEPROM with Different Data
Wrote 1024 bytes in 3486828 us for a time of 3.405 ms/btye
Starting EEPROM Erase
Erased 1024 bytes in 3487136 us for a time of 3.405 ms/btye
Starting EEPROM Erase
Erased 1024 bytes in 4052 us for a time of 0.004 ms/btye

As you can see by the write times, the update functions will not erase/write when they are overwriting the same values. The only real improvement that could be made, is “erasing” the EEPROM by updating to 0xFF could be just using erase (1.7ms) instead of write/erase (3.4ms).

That probably isn’t a big deal, but if you really want to debug your own function try measuring the time it takes to write several bytes to new values. If it doesn’t take 3.4ms per byte, then you aren’t pending upon EEPE correctly.

Thanks for that, I'll check out the EEPR issue.

(I did send a longer reply, but it evaporated!)

Rgds

Damon

Gah! Lost another reply! I'll try again.

I've added an extra eeprom_is_ready() before my write/erase routines return so that the EEPROM activity is definitely finished before they return, which can't do any harm I think, and avoids any chance of the EEPROM write/erase being sabotaged by messing with the CPU clock, or going into sleep mode, etc.

Rgds

Damon

DamonHD: Gah! Lost another reply! I'll try again.

I've added an extra eeprom_is_ready() before my write/erase routines return so that the EEPROM activity is definitely finished before they return, which can't do any harm I think, and avoids any chance of the EEPROM write/erase being sabotaged by messing with the CPU clock, or going into sleep mode, etc.

Rgds

Damon

It sounds like that should have solved your problem. Did it work?

While checking eeprom_is_ready() before your return will solve your problem, it'd still be better to put it at the start of your function like the avr-libc library does (as instructed by the AVR datasheet). When you wait before you return, you're forcing your problem to always delay 3.4ms as it waits for the EEPROM write to complete. When you wait at the start of the function, then your code can continue doing useful things in parallel with the EEPROM write. It's only if you try to write twice within 3.4ms that you're forced to wait for the EEPROM write to complete.

Hi,

1) It is already at the start and I'm not taking it away from there, ie I have belt and braces with it at the start and the end.

2) Haven't had a chance to test it at all yet, and it'll be a while before I get to test it at scale.

Rgds

Damon