I'm using the Arduino Due board with the sam3x8e microcontroller which has 512 kB of persistent storage (Flash) separated into two 256 kB banks (IFLASH0 AND IFLASH1).
I use approximately half a kilobyte of the Flash for the persistent storage of data (the last kilobyte in IFLASH1).
I'm mulling over the idea of doing error-detection-and-correction (EDC) on the Flash when the microcontroller first boots up. Of course I could use checksums and hashing algorithms (e.g. md5, sha256), however since my data sample is so small, it makes more sense just to store the same data at 3 different locations in the Flash. So then at boot up, the routine to check the integrity of the persistent data would go as follows:
(Step 1) Compare data at Address A to data at Address B
(Step 2) Compare data at Address B to data at Address C
(Step 3) If both Step 1 and Step 2 are true then there's no corruption so start the main program. Otherwise proceed to Step 4.
(Step 4) If (A==B || B==C || A==C) then overwrite the corrupt copy with the two good copies, then start the main program.
(Step 5) If Step 4 is false then this means that all three copies are different. What we do here is analyse the data byte-by-byte using a voting system: If one of the bytes has the same value in two of the copies, then that's taken as the correct value. If one byte is different in all three copies then we have an irrevocable error and we can't correct it, so in this case enter into an error state. This wouldn't be difficult to code nor to maintain, even as the persistent storage changes in size.
What would make things more complicated would be if we wanted to perform EDC on the machine code of the C++ program running on the microcontroller. The machine code of my main program only takes up 80 kB, and so I can store it 3 times in the Flash. If we were to actually detect an error in the machine code, then the microcontroller might already have malfunctioned by then – and if it hasn’t already malfunctioned then it might malfunction when we go to try to correct the error.
But we could have a separate self-contained program at the beginning of the Flash which is very small in size (less than 10 kB); the purpose of this small program would be to perform EDC on the main program. In order to make sure that this small program's machine code is totally separate from the main program, I'd have to lay out the Flash something like as follows:
Address 0: A very small self-sufficient program (as small as possible, e.g. 5 - 10 kB) that verifies the checksum of the main program
Address 1: The main program (approx. 80 kb)
Address 2: *a copy of the main program*
Address 3: *a copy of the main program*
Address 4: The persistent data – less than 1 kB in total
Address 5: *a copy of the persistent data*
Address 6: *a copy of the persistent data*
The sum total of the above comes to only about ~250 kB so it will fit easily on the microcontroller's 512 kB Flash.
Do the Arduino development tools make it easy to have two separate programs on the chip, or would I have to do this myself? If I were to relocate my main program in the Flash (i.e. move it forward by 10 kB), then I'd have to make sure that the main program is compiled with -fPIC (i.e. position-independent code). Although another option would be to start my main program with assembler that jumps to the first byte of the small program, and then the small program can jump back to the main program.
If there were a corrupt byte in the small 10 kB program then this would be an undetectable and uncorrectable error.
Being totally realistic though living on Planet Earth in 2022 among other humans, I think most people who’ve spent 10 – 40 years working with microcontrollers might say that this is all major overkill. The machine code written to microcontrollers stays intact for decades …. I mean there’s people nowadays still playing retro Nintendo games consoles from the 1980’s. (although I think they used EEPROM rather than Flash so maybe EEPROM is more reliable). One could argue that there’s more chance of all this error-correction-and-detection stuff causing a failure than if we just had the main program on its own with one copy of the persistent data. It reminds me of how some very reliable electronic devices become less reliable when you add a fuse to them – because the fuse itself is less reliable than the electronic device.
Anyone got any opinons on what I should do here?