The weirdest case of memory corruption?

I am building a TPMS (Tire Pressure Monitoring System) using Arduino Pro Mini with Atmega328P, 3.3V, 8MHz. The data from or related to the TPMS sensors are collected in this structure:

struct TPMS_entry
{
  uint32_t TPMS_ID;
  bool isValid;
  uint16_t secSinceLastUpdated;
  byte TPMS_Status;
  double TPMS_Pressure;
  double TPMS_LowPressureLimit;
  double TPMS_HighPressureLimit;
  boolean LowPressure;
  boolean HighPressure;
  int16_t TPMS_Temperature;
  int16_t RSSIdBm;
  boolean AudibleAlarmActive;
  unsigned short time;
} TPMS[TYRECOUNT + 1];

Here TYRECOUNT = 4, and the last structure member is for a stray tire, which is not in the list.

The matter of interest is the first member -- TPMS_ID -- carrying the unique sensor ID. These IDs are initialized from the list stored in the non-volatile memory, and I carefully checked that they are initialized correctly and then never written to.

All the sensors but the first are handled properly. When an RF packet is received from one of the sensors, it is checked against the values in the TPMS[].TPMS_ID structure members.

In particular, the first member is:
TPMS[0].TPMS_ID = 0x80CDBC58
When this structure value is read to compare with the incoming RF packet, it is read consistently as 0x80CDBC00; the last byte is always corrupted by 0.

After many hours of investigation and a sleepless night, I decided to put a diagnostic message into the loop() to see, at what moment this TPMS[0].TPMS_ID becomes corrupted:

void loop() {
  if(TPMS[0].TPMS_ID != 0x80CDBC58L) {
    Serial.print("Loop TPMS[0].TPMS_ID "); Serial.println(TPMS[0].TPMS_ID);
  }

When I ran it, the 'if' condition was never true, meaning that the TPMS[0].TPMS_ID value seemed to be always correct, the print statement never printed, and the system worked perfectly correctly! But, when I commented out the print statement, making the body of the 'if' statement completely empty, the wrong value with zero LSB was read again!

Simply put, the presence or absence of the print statement here affect the execution, even though this statement is never executed, no matter if it is commented out or not!

I thought about a possible stack corruption due to insufficient memory, but I only use 800 bytes of RAM out of 2048.

I am desperate, and you -- the very smart guys -- are my only hope in my unavailing attempts to fix this bug...

Always post all the code, using code tags. The errors are usually in the parts not posted.

1 Like

#jremington, thank you for your advice, but all the code is several thousand lines.

You can't scare us with a few thousand lines of code.

But the much better and more useful approach is to post the complete, minimum code that reproduces the error. You may find that when you put that together, the error will be obvious.

4 Likes

You have the typical symptoms of overwriting memory that you do not have the rights to (outside the bounds of an array, using a pointer to a local variable that no longer exists, buffer for a c-string that is too small, etc). The actual code that causes the problem likely has no connection to the struct itself, or any code that accesses that struct.

Merely adding the print statement, and consequentially the text literal in RAM, moves around the data in ram and causes the error to no longer overwrite the first element of your struct, but instead something else.

Note that the last byte of the uint_32 is the first byte in your array of struct, since the atmega328 stores the least significant byte in the lowest memory address.

2 Likes

You need help and we are not clairvoyant so we cannot see what you did not publish.

Judicious use of the F-macro may help. It will certainly not hurt.

Almost anything else.

Never mind precisely why I have a great deal of experience in forming the

recommended earlier. If your sketch is thousands of lines long, we could hope it is well enough organized that huge swaths of it could be reduced to stubs or entirely eliminated.

If you did that carefully, you might even see exactly when the problem went away.

If you did that carefully, you might even see the problem before testing showed it to have been eliminated.

I would say doing has meant it is not very often that I end up posting a new topic on a matter. Even so, it still may be something stupid my one pair of eyes didna see.

I've debugged MIDI code with no MIDI devices, no real MIDI library. Same same LCDs and many other hardware bits I don't have or don't want to mess with. I wince when I see the tnagled code presented here too often with everything all mixed up, no good division between logic and hardware.

So if that describes anything going on in your largish sketch maybe this would be an opportunity to reorganize the code for the better. Better flexibility and readability and so forth.

a7

2 Likes

As an aside, memory corruption has completely unpredictable effects, which means that just about every example that one encounters seems like "the weirdest case".

1 Like

Make a full compileable sketch with only your suspected code parts.
There is a good chance that you will find your programming error on your own when you focus your search on smaller parts.

1 Like

@ david_2018, thank you for your advice, which gives a very interesting direction to my investigation!

Would you happen to know how to get the memory allocation table of the statically allocated objects?

Not sure how to get the memory allocation table.

With problem like this, I always start by making sure the IDE is set to show all warnings during compile. That can catch obvious things like for loops that write past the end of an array.

Next is to search for any references to an array, to check that the array index is not out of bounds.

Manipulation of text in a char array (c-string) is also a common source of problems. Using sprintf() with a buffer that is too small, or any function such as strcpy(), strcat(), etc, that might run past the end of the buffer, can cause problems. Be careful to allow space for the terminating null, and for any multi-byte characters when using UTF-8 encoding.

Be vary careful anywhere pointers are used.

1 Like

so it would be good to look at what global variable comes before or after TPMS[TYRECOUNT + 1] and how it's used

2 Likes

@ david_2018, both your comments nailed the problem!

I checked the memory map:
C:\Users\Alex\AppData\Local\Temp\arduino\sketches\<...> avr-objdump -t TPMS_Schrader_328_8.ino.elf
and got, in particular:

008001f6 l     O .bss   00000003 field3
008001f9 l     O .bss   00000091 TPMS

field3 was declared as
char field3[3];
and used in
sprintf(field3, "%3d", temperature);

It appears that I failed to include the space for the string-terminating null, resulting in the overwriting of the LSB of the initial member of structure TPMS.

Thank you very much for your advices which helped me to resolve the problem!

good learning opportunitiy to never use sprintf() again and prefer snprintf() instead

you'll know things failed if you do

if (snprintf(buffer, sizeof buffer, "Format string", ...) >= sizeof buffer) {
  // error - not enough space 
  ...
} else {
  // buffer contains what you expect
}
2 Likes

...or rely on the result being truncated. Which works especially well for things that end up on a display.

1 Like

indeed

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.