Verify the entire flash (including machine code)

fgotham · March 25, 2022, 9:35am

I'm using the Arduino Due board with the sam3x8e microcontroller which has 512 kB of persistent storage (Flash) separated into two 256 kB banks (IFLASH0 AND IFLASH1).

I use approximately half a kilobyte of the Flash for the persistent storage of data (the last kilobyte in IFLASH1).

I'm mulling over the idea of doing error-detection-and-correction (EDC) on the Flash when the microcontroller first boots up. Of course I could use checksums and hashing algorithms (e.g. md5, sha256), however since my data sample is so small, it makes more sense just to store the same data at 3 different locations in the Flash. So then at boot up, the routine to check the integrity of the persistent data would go as follows:
(Step 1) Compare data at Address A to data at Address B
(Step 2) Compare data at Address B to data at Address C
(Step 3) If both Step 1 and Step 2 are true then there's no corruption so start the main program. Otherwise proceed to Step 4.
(Step 4) If (A==B || B==C || A==C) then overwrite the corrupt copy with the two good copies, then start the main program.
(Step 5) If Step 4 is false then this means that all three copies are different. What we do here is analyse the data byte-by-byte using a voting system: If one of the bytes has the same value in two of the copies, then that's taken as the correct value. If one byte is different in all three copies then we have an irrevocable error and we can't correct it, so in this case enter into an error state. This wouldn't be difficult to code nor to maintain, even as the persistent storage changes in size.

What would make things more complicated would be if we wanted to perform EDC on the machine code of the C++ program running on the microcontroller. The machine code of my main program only takes up 80 kB, and so I can store it 3 times in the Flash. If we were to actually detect an error in the machine code, then the microcontroller might already have malfunctioned by then – and if it hasn’t already malfunctioned then it might malfunction when we go to try to correct the error.

But we could have a separate self-contained program at the beginning of the Flash which is very small in size (less than 10 kB); the purpose of this small program would be to perform EDC on the main program. In order to make sure that this small program's machine code is totally separate from the main program, I'd have to lay out the Flash something like as follows:

Address 0: A very small self-sufficient program (as small as possible, e.g. 5 - 10 kB) that verifies the checksum of the main program
Address 1: The main program (approx. 80 kb)
Address 2: *a copy of the main program*
Address 3: *a copy of the main program*
Address 4: The persistent data – less than 1 kB in total
Address 5: *a copy of the persistent data*
Address 6: *a copy of the persistent data*

The sum total of the above comes to only about ~250 kB so it will fit easily on the microcontroller's 512 kB Flash.

Do the Arduino development tools make it easy to have two separate programs on the chip, or would I have to do this myself? If I were to relocate my main program in the Flash (i.e. move it forward by 10 kB), then I'd have to make sure that the main program is compiled with -fPIC (i.e. position-independent code). Although another option would be to start my main program with assembler that jumps to the first byte of the small program, and then the small program can jump back to the main program.

If there were a corrupt byte in the small 10 kB program then this would be an undetectable and uncorrectable error.

Being totally realistic though living on Planet Earth in 2022 among other humans, I think most people who’ve spent 10 – 40 years working with microcontrollers might say that this is all major overkill. The machine code written to microcontrollers stays intact for decades …. I mean there’s people nowadays still playing retro Nintendo games consoles from the 1980’s. (although I think they used EEPROM rather than Flash so maybe EEPROM is more reliable). One could argue that there’s more chance of all this error-correction-and-detection stuff causing a failure than if we just had the main program on its own with one copy of the persistent data. It reminds me of how some very reliable electronic devices become less reliable when you add a fuse to them – because the fuse itself is less reliable than the electronic device.

Anyone got any opinons on what I should do here?

anon35827816 · March 25, 2022, 9:49am

I think your last section has a few very good observations on this. I'd seriously consider if this is necessary at all, and if the drawbacks would outweigh the benefits.
If some kind of redundancy is necessary, I also wonder if it wouldn't be more straightforward to just have two microcontrollers and two sets of flash memory, and a procedure (which could even involve manual intervention) to switch them out if one happens to become faulty. It would at least make the EDC process a lot simpler; essentially removing the C part...

cedarlakeinstruments · March 25, 2022, 12:23pm

For stuff like this, I've used a CRC-16 stored at the end of the file. Not for checking memory integrity, mind you, but usually for doing firmware updates either OTA or over Ethernet, serial, etc., to detect transmission errors. But there's no reason your firmware couldn't calculate its checksum and then compare it to a stored value.

But yeah, I think it's overkill! I have devices I designed back in the early 90's that run from OTP Flash and they're still working fine.

system · September 21, 2022, 12:24pm

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Verify the Due Firmware at boot Due	2	674	March 25, 2023
Checksum / crc / flash Verification 3rd Party Boards	2	386	March 14, 2023
Verifying Arduino machine instructions Programming	15	733	June 26, 2021
Got an error when upload blink example to arduino due. Error : Verify failed IDE 1.x	3	1442	May 5, 2021
Checking if code corrupted General Guidance	15	3014	May 5, 2021

Verify the entire flash (including machine code)

Related topics