Arduino crashes after 15-20 hours or running

I’ve got the following code which read a value from a 434Mhz receiver and outputs the value to the RS232 port. When a value is received the LED pulses. Every works fine but after around 15 - 20 hours or running it looks like it crashes as the LED no longer pulses (ie. It doesnt receive any data from the 434Mhz receiver)

After resetting the arduino everything starts working again. Any ideas where it could be going wrong?

//Receiver connected to digital pin 11
#include <VirtualWire.h>

float power = 0;

int redLedPin  = 7;  // (Pin13 in chip)
int numbers[1];
long count = 0;

void setup()
{
  
 pinMode(redLedPin,OUTPUT); 
 blinkLed(redLedPin,3,250);
  
 vw_setup(1200); // Bits per sec
 vw_rx_start(); // Start the receiver PLL running

    Serial.begin(9600);
    Serial.println("Begin");
}

void loop()
{
  uint8_t buf[VW_MAX_MESSAGE_LEN];
  uint8_t buflen = VW_MAX_MESSAGE_LEN;


if (vw_get_message(buf, &buflen)) // check to see if anything has been received
  {
    blinkLed(redLedPin,1,250);
    //memcpy(buf, numbers, buflen);
    for (int i = 0; i < 1; i++)
    {
      numbers[i] = word(buf[i*2+1],buf[i*2]);
      power = numbers[i];
      power = power / 100;

     count++;
      Serial.print(power,1);
      Serial.print("kwh : count:");
      Serial.print(count);
      Serial.println("");
      power = 0;
    }
    
  }
  

}

void blinkLed(int lPin, int nBlink, int msec) {

if (nBlink) {

       for (int i = 0; i < nBlink; i++) {
              digitalWrite(lPin, HIGH);
              delay(msec);
              digitalWrite(lPin, LOW);
              delay(msec);
              }
      }
}

Maybe you are running out of memory somewhere?
That could explain the symptoms you described.

Why do you have an int array with only one member?
Why do you have a for loop that will only do one iteration?

It seems you never reset ‘count’ - although this should just overflow eventually and not mess up your memory.

Tom - those are good questions. Probably just due to my bad coding :slight_smile:
I"ve put in a reset for the count so that it gets reset to 0.

On the array and loop - what would you suggest I rather do?

Where would I go about detecting where the memory is running out?

On the array and loop - what would you suggest I rather do?

There is no reason to use an array. Just use

int numbers;

Though why is is plural escapes me.
As for the loop, get rid of it. There is no reason to wrap code that gets executed once in any kind of loop,

   numbers = word(buf[1],buf[0]);
      power = numbers;
      power = power / 100;

     count++;
      Serial.print(power,1);
      Serial.print("kwh : count:");
      Serial.print(count);
      Serial.println("");
      power = 0;

The numbers variable seems completely superfluous.

power = word(buf[1],buf[0])/100;

GregM:
Do would I go about detecting where the memory is running out?

Greg, I’d suggest you post the most current code again, so we all know what you are working with.

At first glance I can’t see a memory problem with your code though it could possibly be in the library you are using.

Why don’t you try to output the amount of free memory every 10.000th cycle or so (see http://www.arduino.cc/playground/Code/AvailableMemory for details).

Ahh - ok, I see what you mean about the for loop only looping once and not being necessary.

I’ve now applied the changes and have the following:

//Receiver connected to digital pin 11

#include <VirtualWire.h>

float power = 0;

int redLedPin  = 7;  // (Pin13 in chip)
long count = 0;

void setup()
{
  
 pinMode(redLedPin,OUTPUT); 
 blinkLed(redLedPin,3,250);
  
 vw_setup(1200); // Bits per sec
 vw_rx_start(); // Start the receiver PLL running

  Serial.begin(9600);
  Serial.println("Begin");
}

void loop()
{
  uint8_t buf[VW_MAX_MESSAGE_LEN];
  uint8_t buflen = VW_MAX_MESSAGE_LEN;
 
if (vw_get_message(buf, &buflen)) // check to see if anything has been received
  {
    blinkLed(redLedPin,1,250);
    power = word(buf[1],buf[0]);
    power = power / 100;
    count++;
    Serial.print(power,1);
    Serial.print("kwh : count:");
    Serial.print(count);
    Serial.println("");
    power = 0;
    }
}

void blinkLed(int lPin, int nBlink, int msec) {

if (nBlink) {

       for (int i = 0; i < nBlink; i++) {
              digitalWrite(lPin, HIGH);
              delay(msec);
              digitalWrite(lPin, LOW);
              delay(msec);
              }
      }
}

I’ll also have a look at outputting the free memory, a great trick. This should tell me if there is maybe a memory leak or something on the VirtualWire library.

Update: I’m printing out the free memory, and its been 1637 for the past hour or so. I’ll let it run until it crashes and see if it changes.

Failing that, would running a WDT be the answer?

Well, personally I would first try to figure out where exactly the problem occurs and how to fix it directly instead of adding a watchdog.

But then, if you just need a quick & dirty fix a WDT might be an idea.

I presume you've checked the transmit side hasn't stopped working? Or that there isn't interference preventing correct reception?

This is what I would do.

You do have a bug in your code in that you’re assuming you received a certain amount of data without actually validating you’ve received that much data. That’s bad. Furthermore, You’re assuming the virtualwire library always provides perfect data. That assumption is not safe. It uses a CRC16 which is only good for detecting single bit errors. While it commonly does guard against multi-bit errors, should you encounter them, its also possible a multi-bit error will go undetected. That’s why in the golden day of modem transfers (example, zmodem), CRC32 became the golden standard.

The tweaked code has not been compiled and my therefore contain typos. You may need to further tweak. The changes attempt to detect short reads and reports/counts errors when a short read is detected. I also moved the variable declaration outside of the loop and placed it where the other variables were being declared. Depending on your use, you may also want to consider declaring them static.

//Receiver connected to digital pin 11

#include <VirtualWire.h>

float power = 0;

int redLedPin  = 7;  // (Pin13 in chip)
long count = 0;
long error = 0 ;
uint8_t buf[VW_MAX_MESSAGE_LEN];
uint8_t buflen = VW_MAX_MESSAGE_LEN;
 

void setup()
{
  
    pinMode(redLedPin,OUTPUT); 
    blinkLed(redLedPin,3,250);
  
    vw_setup(1200); // Bits per sec
    vw_rx_start(); // Start the receiver PLL running

    Serial.begin(9600);
    Serial.println("Begin");
}

void loop()
{
    if (vw_get_message(buf, &buflen)) // check to see if anything has been received
    {
	blinkLed(redLedPin,1,250);

	if( buflen >= 2 ) {
	    power = word(buf[1],buf[0]);
	    power = power / 100;
	    count++;
	    Serial.print(power,1);
	    Serial.print("kwh : count:");
	    Serial.print(count);
	    Serial.print( "errors: " ) ;
	    Serial.print( error ) ;
	    Serial.println("");
	    power = 0;
	} else {
	    error++ ;
	    Serial.print( "did not receive enough data; error: " ) ;
	    Serial.print( error ) ;
	}
    }
}

void blinkLed(int lPin, int nBlink, int msec) {

    if (nBlink) {

	for (int i = 0; i < nBlink; i++) {
	    digitalWrite(lPin, HIGH);
	    delay(msec);
	    digitalWrite(lPin, LOW);
	    delay(msec);
	}
    }
}

Thanks for the code. I've added it in (compiles fine) - I'll run it for a while and see if it still crashes and report back.

BTW, you could also drop in another else statement to monitor the number of messages received which failed their CRC. It would give you an idea of just how noisy the RF environment. Furthermore, it might also hint if there is a problem with the VW library in a less common (error) code path. If you do this, you'll probably want to make the distinction between short reads and actual crc errors. Also a note, the comment on the "vw_get_message", isn't entirely clear. Does it block until a message is available? If not, a false return doesn't tell if a message wasn't available or if a message was available and it failed its CRC. So that's something you'll need to further explore.

Since you're already changing out code, now would be a good time to shoehorn that in.

Seem to be making some progress now. I checked this morning and the bad news is it 'crashed' again, the good news is its not a complete crash.

It was printing out "did not receive enough data; error:"

It seems like once it receives the first error it never recovers - the error count was 524 and increasing all the time. After error 1 it didn't receive any valid data.

I know its not the transmitter as I have another ATMega328 running on a PCB with the receiver module and its still receiving valid data while the Arduino remains in a error state.

After I reset the Arduino it continued to receive valid data.

Any ideas where to now? I thought making a code change to check if error count is > 2 then call a softreset routine to reboot the arduino? I'm not sure if this is even possible or advisable. Its not addressing the problem but I"m not sure how to figure out why this is happening.

You may want to spend some time with the documentation…

vw_get_message(buf, &buflen))

Read the last received message. This should be called only when a message is known to be received with any of the 3 functions above. “buf” is an array where the message is copied. “buflen” should have the array’s maximum size upon input, and upon return the number of bytes actually copied is retured. The function itself returns true if the message was verified correct, or false if a message was received but appears to have been corrupted.

I was afraid of that. Goes to show, you need to read your API documentation before using it. If it had been read, the OP would probably not of had a problem. I had hoped my previous comment would have spurred him to double check those assumptions.

//Receiver connected to digital pin 11

#include <VirtualWire.h>

float power = 0;

int redLedPin  = 7;  // (Pin13 in chip)
long count = 0;
long error = 0 ;
uint8_t buf[VW_MAX_MESSAGE_LEN];
uint8_t buflen = VW_MAX_MESSAGE_LEN;
 

void setup()
{
  
    pinMode(redLedPin,OUTPUT); 
    blinkLed(redLedPin,3,250);
  
    vw_setup(1200); // Bits per sec
    vw_rx_start(); // Start the receiver PLL running

    Serial.begin(9600);
    Serial.println("Begin");
}

void loop()
{
    if (vw_have_message() &&
	vw_get_message(buf, &buflen)) // check to see if anything has been received
    {
	blinkLed(redLedPin,1,250);

	if( buflen >= 2 ) {
	    power = word(buf[1],buf[0]);
	    power = power / 100;
	    count++;
	    Serial.print(power,1);
	    Serial.print("kwh : count:");
	    Serial.print(count);
	    Serial.print( "errors: " ) ;
	    Serial.print( error ) ;
	    Serial.println("");
	    power = 0;
	} else {
	    error++ ;
	    Serial.print( "did not receive enough data; error: " ) ;
	    Serial.print( error ) ;
	}
    }
}

void blinkLed(int lPin, int nBlink, int msec) {

    if (nBlink) {

	for (int i = 0; i < nBlink; i++) {
	    digitalWrite(lPin, HIGH);
	    delay(msec);
	    digitalWrite(lPin, LOW);
	    delay(msec);
	}
    }
}

Thanks - I've read through the docs and see they mention the w_have_message routine. Funny the example from the VirtualWire docs doesnt even call this.

Running my program again and will let you know if it falls over with the new routine in.

Same thing happened - looks like after it receives the first error it never recovers.

GregM:
Same thing happened - looks like after it receives the first error it never recovers.

That's certainly disappointing. I expected we were done.

I quickly eyed the code one more time this morning. I'm not seeing anything jump out at me. With the addition of checking for a message in the latest code, my concerns for behavioral ambiguity have been addressed. I'm inclined to say its either a bug in the VirtualWire library or you have a component which is taking things out into left field. Given the board seems to continue running and not really crashing, and all we know is you're getting short reads. I would first look there.

Unless you can readily rule out RF module failure (one to swap out?), I'd start exploring the software route. First of all, add debugging code on your TX side to verify you are, in fact, ALWAYS transmitting at least two bytes per frame. A short TX (one byte) would absolutely cause a false positive on the RX side. So make absolutely sure the TX code isn't the core problem. If after you're absolutely sure its not a false positive on the RX side because of a bug on the TX side, dig into the virtual wire library.

As a side note, I reviewed some of the VirtualWire code. Contrary to the comment, vw_have_message is absolutely not required. So it makes sense it changed nothing. If you look in vw_get_message, the first thing it does is check the status of the same variable the vw_have_message function does. So NOT calling vw_have_message, as you originally had it, is in fact, ideal. Not calling it was not a bug. That explains why, reportedly, the example code doesn't use it.

Furthermore, the message will be copied regardless of CRC failure. That means technically, any number of bytes will be available on a bad CRC. But, I don't see a mechanism which allows you to discern a receive with bad CRC versus no receive. The implementation details of the vw_get_message are sorta odd, but not likely to be causing your issue.

So in a nut shell, if you'd like, post your TX code. And if you have another RF module to swap out, please do so. Please add a debugging line which reports the size which has been received. Please add debugging lines which report the size which has been transmitted. What is the reported length of the short rx and tx?

"buflen" should have the array's maximum size upon input

Would you like me to show you were to put the one line code?

Coding Badly - I don't follow? It would be helpful if you showed me the one liner.

void loop()
{
buflen = VW_MAX_MESSAGE_LEN;
if (vw_have_message() &&
vw_get_message(buf, &buflen)) // check to see if anything has been received
{

This is probably a better structure…

void loop()
{
if (vw_have_message() )
{
buflen = VW_MAX_MESSAGE_LEN;

if ( vw_get_message(buf, &buflen) ) // check to see if anything has been received
{
// process buf here
}
}