Code stops working every couple of months

Hello,

I have a code which reads an NFC tag and if a valid tag is present it opens a relay. The code works basically fine. Once deployed it runs for months without issues, then all of a sudden the code stops working. Rebooting the Arduino board (power off / on) and it works again for a couple of months. I have no clue how to figure out what to do to get to the root cause. Any idea would be very much appreciated.

When I say stop working I can only say that the relay is not triggered, I have no idea how to attach at that point in time a debugger or look where it actually has crashed.

/*
 * --------------------------------------------------------------------------------------------------------------------
 * Example sketch/program showing how to read new NUID from a PICC to serial.
 * --------------------------------------------------------------------------------------------------------------------
 * This is a MFRC522 library example; for further details and other examples see: https://github.com/miguelbalboa/rfid
 */

#include <SPI.h>
#include <MFRC522.h>

#define SS_PIN 10
#define RST_PIN 9

MFRC522 rfid(SS_PIN, RST_PIN); // Instance of the class
MFRC522::MIFARE_Key key; 

byte nuidPICC[4];

void setup() { 
    Serial.begin(9600);
    SPI.begin(); // Init SPI bus
    rfid.PCD_Init(); // Init MFRC522 
    
    for (byte i = 0; i < 6; i++) {
        key.keyByte[i] = 0xFF;
    }
    pinMode(4, OUTPUT); // output of relais
}

void loop() {
    if ( ! rfid.PICC_IsNewCardPresent())
        return;
    
    // Verify if the NUID has been readed
    if ( ! rfid.PICC_ReadCardSerial())
        return;
    
    
    MFRC522::PICC_Type piccType = rfid.PICC_GetType(rfid.uid.sak);
    
    // Check is the PICC of Classic MIFARE type
    if (piccType != MFRC522::PICC_TYPE_MIFARE_MINI &&  
        piccType != MFRC522::PICC_TYPE_MIFARE_1K &&
        piccType != MFRC522::PICC_TYPE_MIFARE_4K) {
        //Serial.println(F("Your tag is not of type MIFARE Classic."));
    return;
        }
        
        //compare if MIFI ID is valid
        char bufferUID[40] = "";
        
        int byte0=0; 
        int byte1=0; 
        int byte2=0; 
        int byte3=0; 
        
        byte0=(int) rfid.uid.uidByte[0]; 
        byte1=(int) rfid.uid.uidByte[1]; 
        byte2=(int) rfid.uid.uidByte[2]; 
        byte3=(int) rfid.uid.uidByte[3]; 


        snprintf(bufferUID, 40, "%d:%d:%d:%d", byte0, byte1, byte2, byte3);

        const char *validTags[] = { "2:2:1:1", "1:1:1:1"}; //usually here real values of the cards
        int len = (sizeof (validTags) / sizeof (*validTags));
        
        bool tagValid = false;
        for (int x = 0; x < len; x++) {
            if (strcmp (bufferUID, validTags[x]) == 0) {
                tagValid = true;
            } else {
            }
        }
        
        if(tagValid)
        {
            digitalWrite(4, HIGH); // send signal
            delay(1000);            // for 1s, should be enough
            digitalWrite(4, LOW);  // stop signal again
        }
        else
        {

        }
        
        
        // Store NUID into nuidPICC array
        for (byte i = 0; i < 4; i++) {
            nuidPICC[i] = rfid.uid.uidByte[i];
        }

        
        // Halt PICC
        rfid.PICC_HaltA();
        
        // Stop encryption on PCD
        rfid.PCD_StopCrypto1();
}


Hi Adiforum,

I do not see the usual suspects in your code:

  • String (with capital S)
  • Use of millis() combined with unhandled overflow of unsigned long...

Could the problem be in your hardware?
You could log and check the free memory every day in your code... maybe something is added to stack or heap every tag check... after a thousand checks, that might add up to fully fill the memory...
From the code you posted, I do not expect this is the cause, but it would be good to rule out.

Like this? I guess you could wait for another month and see if it starts back up again? There are about 13 different references to millis() in here. I just grabbed one at random.

-jim lee

You coukd save some time by goosing millis() up to a few minutes before it rolls over. Or whatever.

Oh yes you can.

Declare the variable millis() uses in you own code:

extern volatile unsigned long timer0_millis;

To set millis():

  noInterrupts ();
  timer0_millis = 4294967295UL - 20000;  // 20 seconds to doomsday
  interrupts ();

a7

2 Likes

Umm, the OP isn't the one using millis(). It's in the library included. Would this idea still work for them?

-jim lee

I'm tempted to say of course it will. I don't know for sure, but…

… there's one mechanism somewhere that ticks up the milliseconds counter, the millis() function returns that value so messing with that variable will mess with eveyone's millis().

It is true that it might not be a definitive diagnostic test, as I am sure we could all someone could write a bad program that needed to run for the entire time in order that the rollover be a bad thing.

a7

Hi @jimLee
Where is this coming from?
It us not in the OP...
And the last statement is an example where things can go wrong...
Timers should use subtraction, not addition...

There are a few instances in the MFRC522 library where the code doesn't allow for the millis() rollover every 49.7 days.

All the below are from src/MRFC522.cpp.

In function PCD_CalculateCRC

	// Wait for the CRC calculation to complete. Check for the register to
	// indicate that the CRC calculation is complete in a loop. If the
	// calculation is not indicated as complete in ~90ms, then time out
	// the operation.
	const uint32_t deadline = millis() + 89;

	do {
		// DivIrqReg[7..0] bits are: Set2 reserved reserved MfinActIRq reserved CRCIRq reserved reserved
		byte n = PCD_ReadRegister(DivIrqReg);
		if (n & 0x04) {									// CRCIRq bit set - calculation done
			PCD_WriteRegister(CommandReg, PCD_Idle);	// Stop calculating CRC for new content in the FIFO.
			// Transfer the result from the registers to the result buffer
			result[0] = PCD_ReadRegister(CRCResultRegL);
			result[1] = PCD_ReadRegister(CRCResultRegH);
			return STATUS_OK;
		}
		yield();
	}
	while (static_cast<uint32_t> (millis()) < deadline);

In function PCD_SoftPowerUp:

	// wait until PowerDown bit is cleared (this indicates end of wake up procedure) 
	const uint32_t timeout = (uint32_t)millis() + 500;// create timer for timeout (just in case) 
	
	while(millis()<=timeout){ // set timeout to 500 ms 
		val = PCD_ReadRegister(CommandReg);// Read state of the command register
		if(!(val & (1<<4))){ // if powerdown bit is 0 
			break;// wake up procedure is finished 
		}
		yield();
	}

In function PCD_CommunicateWithPICC:

	// Wait here for the command to complete. The bits specified in the
	// `waitIRq` parameter define what bits constitute a completed command.
	// When they are set in the ComIrqReg register, then the command is
	// considered complete. If the command is not indicated as complete in
	// ~36ms, then consider the command as timed out.
	const uint32_t deadline = millis() + 36;
	bool completed = false;

	do {
		byte n = PCD_ReadRegister(ComIrqReg);	// ComIrqReg[7..0] bits are: Set1 TxIRq RxIRq IdleIRq HiAlertIRq LoAlertIRq ErrIRq TimerIRq
		if (n & waitIRq) {					// One of the interrupts that signal success has been set.
			completed = true;
			break;
		}
		if (n & 0x01) {						// Timer interrupt - nothing received in 25ms
			return STATUS_TIMEOUT;
		}
		yield();
	}
	while (static_cast<uint32_t> (millis()) < deadline);
2 Likes

please show us the wiring between your board and the RC522, measure the length of the wires and tell us more about the environment where you are using your device.

As a workaround you could activate the Watchdog.

On long term, I would get rid off the char pointer array for valid tags, and just store the tags as 4 byte UID like they are readed from the cards/tags.

@van_der_decken: good work!

So: not the best library...
3 options:
1 fix it
2 find an update
3 find another library... (suggestions?)

If @van_der_decken has found all, it is not difficult to fix...

If you use the proper end time - start time = elapsed time,
there is no problem with unsigned integer rollover. The longest elapsed time is 32-bits of time unit be it millis or micros but short of that, there is no handling necessary.

If you add times across rollover, then you need special handling or perhaps just LEARN WHAT ALWAYS WORKS.

Unsigned subtraction is all that's needed.

1 Like

That library needs fixing!

Agreed. It's a simple enough thing to fix but not the kind of thing I completely trust myself to get 100% right way past midnight. :slight_smile:

Indeed: rollover should not be a problem and can be fixed using subtraction instead of addition... (that is what I meant with correct handling).

Unless you want to set a timer in over 49 days...
Then you need to catch the rollover...

To fix the occasions found so far by @van_der_decken catching of rollovers is not needed.

Over 10 years ago, a member named (IIRC) Moresy Doates made a 64 bit timer library. Sol will likely go Red Giant before that one rolls over!

That member's code was always so well done, I was jealous. If I came up with a working solution, it wouldn't be as clear.

I looked at the sections laid out.

// In function PCD_CalculateCRC

// const uint32_t deadline = millis() + 89;
const uint32_t deadline = 89;
const uint32_t startTime = millis();

// while (static_cast<uint32_t> (millis()) < deadline);
while (static_cast<uint32_t> (millis()) - startTime < deadline);

and the same basic treatment for the rest.
It's cut and dry with not missing any instances being the hard part.

What I would be tempted to 'fix' is the blocking nature of that code which gives me the heebie-jeebies.

That it happens every other month is an indication that the roll-over may indeed be the cause.

If it shows up every 49.7-some days then it is indication that faulty timing code is a cause for sure. And if there's more other bugs, fixing the timers will get those to show.

In fact the timing should be fixed regardless. As it is now it is for sure going to cause an / the) issue.
The proper thing to do about the library is to contact the author on github, and depending on the policy the author has about issues, either raise the issue or start a conversation about the issue.

Once it's properly repaired and part of the repository, no other users will run into at least this specific bug.

Thanks for the replies. I am wondering that I am the only one using that library long term as an IoT device and nobody else encounters that issue after such a long time. I created an issue on Github for the library, though I am not too positive since it is not really maintained anymore. Can someone give me some more details on what I can do to fix it myself? I am a bit lost here in the details.

Other option, I found as well this

https://docs.arduino.cc/libraries/rfid_mfrc522v2/#Releases

would that have the same issue or would that work long term?