i2C failure debugging

Is there an obvious reason why all i2c values could suddenly start reporting max values?

I am using an Adafruit M0 feather. Attached to the M0 is a temperature sensor, RGB sensor, and accelerometer, all connected to the i2C bus.

Everything works perfectly as expected for about 7 days or so, then all the sensor values start reading (what appears to be) their max possible value.

For example, the temperature and humidity values are a float. After about 7-ish days, the read values will become 2147484.0, instead of 22.1 or 44.5 etc.

Here is the read humidity function:

float read_humidity(void)
{
  RunningMedian samples = RunningMedian(NUM_READINGS);

  //Read the sensor NUM_READINGS times with READING_DELAY between each reading
  for (int i = 0; i < NUM_READINGS; i++)
  {
    float hum = (sensor.readHumidity()+2.5); //add calibration value
    samples.add(hum);
    delay(READING_DELAY);
  }

  #ifdef DEBUG
    Serial.print("Humidity: ");
    Serial.println(samples.getMedian());
  #endif
  float val = ((int)(1000*samples.getMedian()))/1000.0; //truncates to 3 decimal points
  return val; //Get the median value and return it
}

My first thought was maybe there was some type of memory leak. I found this library for reporting the amount of free dynamic memory, and used it, but found that this was not the case, as the value remained constant for several days:

#include <MemoryFree.h>;

#ifdef __arm__
// should use uinstd.h to define sbrk but Due causes a conflict
extern "C" char* sbrk(int incr);
#else  // __ARM__
extern char *__brkval;
#endif  // __arm__

Then added Serial.println(freeMemory()); the main loop.

I am stumped. It is unfortunate that it takes so long for this to occur. It makes debugging very difficult...

I found that (this may be obvious) that when I unplug any of the sensors, the values for that sensor will become the max values until I plug the sensor back in.
So after the 7 day failure event, it is almost as if the entire i2c bus becomes disconnected.

Full code below:

#include <MemoryFree.h>;

#ifdef __arm__
// should use uinstd.h to define sbrk but Due causes a conflict
extern "C" char* sbrk(int incr);
#else  // __ARM__
extern char *__brkval;
#endif  // __arm__


//=======================Compile Options+============================
#define SERIAL_OUTPUT //Comment to turn off serial output and save memory


//===========================Libraries===============================
#include <Wire.h>
#include <SPI.h>
#include <RH_RF95.h> //Radio
#include <RHReliableDatagram.h> //Radio manager
#include <Adafruit_MPU6050.h> //acclerometer
#include <Adafruit_Sensor.h>
#include "Adafruit_Si7021.h" //Temp/Humidity
#include <RunningMedian.h>
#include <Adafruit_NeoPixel.h> //RGB LED
#include "SparkFunISL29125.h" //RGB Light sensor
#ifdef __AVR__
#include <avr/power.h>
#endif


//==============================Pins=================================
#define RFM95_CS 8
#define RFM95_RST 4
#define RFM95_INT 3

const int pausePin = 0;
const int ledPin = 5; //RGB LED


//============================Constants==============================
#define NODE_ID 1 //This Node's unique ID
#define SERVER_ID 41 //The server this node should be talking to


#define RF95_FREQ 915.0 // Frequency must match RX's freq
#define NUM_READINGS 9 //Number of readings for each sensor. Median is taken from readings. Must be an odd number
#define NUM_READINGS_accel 695
#define READING_DELAY 10 //Delay in ms between each of the NUM_READINGS readings
#define LED_BRIGHTNESS 3 //Global LED brightness 0-10. 



const uint16_t tx_hb_time = 1500*(NODE_ID*0.750); //Tx heartbeat time in milliseconds



//Colours
enum colours
{
  OFF = 0,
  RED,
  GREEN,
  YELLOW,
  ORANGE,
  BLUE,
  NUM_COLOURS
};

//=============================Objects===============================

//The data packet being sent by this node
typedef struct NodePacket
{
  int nodeID;
  int paused;
  float accelX_amp;
  float accelX_freq;
  float accelY_amp;
  float accelY_freq;
  float accelZ_amp;
  float accelZ_freq;
  int light_R;
  int light_G;
  int light_B;
  float temp;
  float hum;
};

//The data packet being received from the server
typedef struct ServerPacket
{
  int nodeID;
  int paused;
  int pauseRequested;
  int startRequested;
};

Adafruit_MPU6050 mpu;
Adafruit_Sensor *mpu_accel;
Adafruit_Si7021 sensor = Adafruit_Si7021();
RH_RF95 rf95(RFM95_CS, RFM95_INT);
RHDatagram manager(rf95, NODE_ID);
Adafruit_NeoPixel pixel = Adafruit_NeoPixel(1, 5, NEO_GRB + NEO_KHZ800);
SFE_ISL29125 RGB_sensor;
ServerPacket serverPacket;
NodePacket nodePacket;


//=========================Global Variables==========================

int lastTxTime = 0; //last time in millis a transmission occured
int paused = 0; //paused status
int buttonPressedFlag = 0;
int buttonState = 0; 

//===============================Setup===============================
void setup()
{
  Serial.begin(115200);
  init_structures();
  init_LED();
  init_radio();
  init_pauseButton();
  init_accelerometer();
  init_lightSensor();
  set_LED(GREEN, LED_BRIGHTNESS);
}


//===============================Loop================================
void loop()
{
  transmit();
  delay(2000);
  read_sensors();
  Serial.println(freeMemory());
}


/* read_sensors
   @brief   Reads all sensors, checks thresholds, 
            and builds the radio packet.
   @param   None
   @return  None
*/
void read_sensors(void)
{

  float accelVals[3]; //X, Y, Z
  int lightVals[3]; //R, G, B
  
  #ifdef SERIAL_OUTPUT
    Serial.println("Reading Sensors");
  #endif

  //Build packet
  nodePacket.nodeID = NODE_ID;

  //Read Temperature and check threshold
  nodePacket.temp = read_temp();

  //Read Humidity and check threshold
  nodePacket.hum = read_humidity();

  //Read Accelerometer and check threshold
  read_accel(accelVals);
  nodePacket.accelX_amp = accelVals[0];
  nodePacket.accelY_amp = accelVals[1];
  nodePacket.accelZ_amp = accelVals[2];
  nodePacket.accelX_freq = 0; //Placeholder
  nodePacket.accelY_freq = 0; //Placeholder
  nodePacket.accelZ_freq = 0; //Placeholder
  
  //Read light and check threshold
  read_light(lightVals);
  nodePacket.light_R = lightVals[0];
  nodePacket.light_G = lightVals[1];
  nodePacket.light_B = lightVals[2];
  
  //read pause button
  buttonState = digitalRead(pausePin);

  if (buttonState == LOW && buttonPressedFlag == 0)
  {
    buttonPressedFlag = 1;
    if (paused == 0)
    {
      paused = 1;
    }else{
      paused = 0;
    }
    delay(500);//debounce
  }
  if (buttonState == HIGH && buttonPressedFlag == 1)
  {
    buttonPressedFlag = 0;
  }
  
  if(paused)
  {
    set_LED(YELLOW, LED_BRIGHTNESS);
  }
  else
  {
    set_LED(GREEN, LED_BRIGHTNESS);
  }
  nodePacket.paused = paused; //update the message packet

}


/* init_accelerometer

   @brief   Initializes the accelerometer.
   @param   None
   @return  None
*/
void init_accelerometer(void)
{
  // Try to initialize!
  if (!mpu.begin()) {
    Serial.println("Failed to find MPU6050 chip");
    while (1) {
      delay(10);
    }
  }
  mpu.setAccelerometerRange(MPU6050_RANGE_2_G);
  mpu.setFilterBandwidth(MPU6050_BAND_44_HZ);
  mpu_accel = mpu.getAccelerometerSensor();
  delay(100);

}


/*  init_radio

   @brief   Initializes all radio settings.
   @param   None
   @return  None
*/
void init_radio(void)
{
  pinMode(RFM95_RST, OUTPUT);
  digitalWrite(RFM95_RST, HIGH);

  // manual reset
  digitalWrite(RFM95_RST, LOW);
  delay(10);
  digitalWrite(RFM95_RST, HIGH);
  delay(10);

  while (!manager.init()) {
  #ifdef DEBUG
      Serial.println("Radio init failed");
  #endif
    while (1);
  }
  #ifdef DEBUG
    Serial.println("Radio init passed");
  #endif

  // Defaults after init are 434.0MHz, modulation GFSK_Rb250Fd250, +13dbM
  if (!rf95.setFrequency(RF95_FREQ)) {
  #ifdef DEBUG
      Serial.println("setFrequency failed");
  #endif
    while (1);
  }
  #ifdef DEBUG
    Serial.print("Set Freq to: "); Serial.println(RF95_FREQ);
  #endif

  // Defaults after init are 434.0MHz, 13dBm, Bw = 125 kHz, Cr = 4/5, Sf = 128chips/symbol, CRC on
  // The default transmitter power is 13dBm, using PA_BOOST.
  // If you are using RFM95/96/97/98 modules which uses the PA_BOOST transmitter pin, then
  // you can set transmitter powers from 5 to 23 dBm:
  rf95.setTxPower(23, false);
}


/* transmit

   @brief   Transmits nodePacket to SERVER_ID until a reply from SERVER_ID
            is received or num_tx_attempts is exceeded. Reply from SERVER_ID
            contains thresholds that update local thresholds if the reply is valid.
   @param   None
   @return  None
*/
void transmit(void)
{
  uint8_t len = sizeof(serverPacket);
  uint8_t fromAddress;

    #ifdef SERIAL_OUTPUT
      Serial.println("\nTransmitting: ");
      Serial.print("Node ID: \t");
      Serial.println(nodePacket.nodeID);
      Serial.print("Paused: \t");
      Serial.println(nodePacket.paused);
      Serial.print("X Accel: \t");
      Serial.println(nodePacket.accelX_amp);
      Serial.print("Y Accel: \t");
      Serial.println(nodePacket.accelY_amp);
      Serial.print("Z Accel: \t");
      Serial.println(nodePacket.accelZ_amp);
      Serial.print("R Light: \t");
      Serial.println(nodePacket.light_R);
      Serial.print("G Light: \t");
      Serial.println(nodePacket.light_G);
      Serial.print("B Light: \t");
      Serial.println(nodePacket.light_B);
      Serial.print("Temperature: \t");
      Serial.println(nodePacket.temp);
      Serial.print("Humidity: \t");
      Serial.println(nodePacket.hum);
    #endif
      

    if(!manager.sendto((uint8_t *)&nodePacket, sizeof(nodePacket), SERVER_ID))
    {
      #ifdef SERIAL_OUTPUT
        Serial.println("Transmit failed");
      #endif
    }

    rf95.waitPacketSent();

    // Now wait for a reply
    #ifdef SERIAL_OUTPUT
      Serial.println("\nWaiting for reply...");
    #endif
    if (rf95.waitAvailableTimeout(500))
    {
      // Should be a reply message for us now
      if (manager.recvfrom((uint8_t *)&serverPacket, &len, &fromAddress))
      {
        if(fromAddress != SERVER_ID)
        {
          #ifdef SERIAL_OUTPUT
            Serial.println("Rcvd from wrong server");
          #endif
        }
        else if(serverPacket.nodeID != NODE_ID)
        {
          #ifdef SERIAL_OUTPUT
            Serial.println("Rcvd wrong msg");
          #endif
        }
        else
        {
          #ifdef SERIAL_OUTPUT
            Serial.println("\nReply valid");
            Serial.println("Received: ");
            Serial.print("Node ID: \t\t");
            Serial.println(serverPacket.nodeID);
            Serial.print("Paused: \t\t");
            Serial.println(serverPacket.paused);
            Serial.print("Pause req: \t\t");
            Serial.println(serverPacket.pauseRequested);
            Serial.print("start req: \t\t");
            Serial.println(serverPacket.startRequested);
          #endif
          updateThresholds();
        }
      }
      else
      {
        #ifdef SERIAL_OUTPUT
          Serial.println("Receive failed");
        #endif
      }
    }
    else
    {
      #ifdef SERIAL_OUTPUT
        Serial.println("No reply");
      #endif
    }

//  nodePacket.context = NO_TX; //Reset context
  lastTxTime = millis();
}



/* read_temp

   @brief   Reads temperature from sensor and returns float.
   @param   None
   @return  float temperature
*/
float read_temp(void)
{
  RunningMedian samples = RunningMedian(NUM_READINGS);

  //Read the sensor NUM_READINGS times with READING_DELAY between each reading
  for (int i = 0; i < NUM_READINGS; i++)
  {
    float temp = (sensor.readTemperature()-0.9); // minus calibration value
    samples.add(temp);
    delay(READING_DELAY);
  }

  #ifdef DEBUG
    Serial.print("Temp: ");
    Serial.println(samples.getMedian());
  #endif
  float val = ((int)(1000*samples.getMedian()))/1000.0; //truncates to 3 decimal points
  return val; //Get the median value and return it
}


/* read_humidity

   @brief   Reads humidity from sensor and returns float.
   @param   None
   @return  float humidity
*/
float read_humidity(void)
{
  RunningMedian samples = RunningMedian(NUM_READINGS);

  //Read the sensor NUM_READINGS times with READING_DELAY between each reading
  for (int i = 0; i < NUM_READINGS; i++)
  {
    float hum = (sensor.readHumidity()+2.5); //add calibration value
    samples.add(hum);
    delay(READING_DELAY);
  }

  #ifdef DEBUG
    Serial.print("Humidity: ");
    Serial.println(samples.getMedian());
  #endif
  float val = ((int)(1000*samples.getMedian()))/1000.0; //truncates to 3 decimal points
  return val; //Get the median value and return it
}


/* read_accel

   @brief   Reads accelerations from sensor and updates accelVals[].
   @param   float accelVals[]
   @return  None
*/
void read_accel(float accelVals[])
{
  RunningMedian x_samples = RunningMedian(NUM_READINGS_accel);
  RunningMedian y_samples = RunningMedian(NUM_READINGS_accel);
  RunningMedian z_samples = RunningMedian(NUM_READINGS_accel);

//Read the sensor NUM_READINGS times with READING_DELAY between each reading
  for (int i = 0; i < NUM_READINGS_accel; i++)
  {
    sensors_event_t accel;
    mpu_accel->getEvent(&accel);
    x_samples.add(accel.acceleration.x);
    y_samples.add(accel.acceleration.y);
    z_samples.add(accel.acceleration.z);
  }
  float valx = ((int)(1000*x_samples.getHighest()))/1000.0;
  float valy = ((int)(1000*y_samples.getHighest()))/1000.0;
  float valz = ((int)(1000*z_samples.getHighest()))/1000.0;
  accelVals[0] = (valx);
  accelVals[1] = (valy);
  accelVals[2] = (valz);
}


/* read_light

   @brief   Reads light values from sensor and updates lightVals[].
   @param   float lightVals[]
   @return  None
*/
void read_light(int lightVals[])
{
  RunningMedian r_samples = RunningMedian(NUM_READINGS);
  RunningMedian g_samples = RunningMedian(NUM_READINGS);
  RunningMedian b_samples = RunningMedian(NUM_READINGS);

  //Read the sensor NUM_READINGS times with READING_DELAY between each reading
  for (int i = 0; i < NUM_READINGS; i++)
  {
    r_samples.add(RGB_sensor.readRed());
    g_samples.add(RGB_sensor.readGreen());
    b_samples.add(RGB_sensor.readBlue());
    delay(READING_DELAY);
  }
  float valr = ((int)(1000*r_samples.getHighest()))/1000.0;
  float valg = ((int)(1000*g_samples.getHighest()))/1000.0;
  float valb = ((int)(1000*b_samples.getHighest()))/1000.0;
  lightVals[0] = valr;
  lightVals[1] = valg;
  lightVals[2] = valb;
}


/* init_LED

   @brief   Initializes the RGB NeoPixel LED.
   @param   None
   @return  None
*/
void init_LED(void)
{
  pixel.begin();
  delay(500);
  set_LED(RED, LED_BRIGHTNESS);
}


/* set_LED

   @brief   Sets the RGB LED to a predefined colour.
   @param   uint8_t colour. Colour value from enum colours
   @param   uint8_t brightness. Brightness value from 1-10
   @return  None
*/
void set_LED(uint8_t colour, uint8_t brightness)
{
  uint8_t red = 0;
  uint8_t green = 0;
  uint8_t blue = 0;

  uint8_t divider = 11 - brightness;
  divider = constrain(divider, 1, 10);
  if(colour == OFF) 
  {
    divider = 1;
  }

  switch (colour)
  {
    case OFF:
      red = 0;
      green = 0;
      blue = 0;
      break;
    case RED:
      red = 255;
      green = 0;
      blue = 0;
      break;
    case GREEN:
      red = 0;
      green = 255;
      blue = 0;
      break;
    case YELLOW:
      red = 255;
      green = 240;
      blue = 0;
      break;
    case ORANGE:
      red = 255;
      green = 145;
      blue = 0;
      break;
    case BLUE:
      red = 0;
      green = 0;
      blue = 255;
      break;
    default:
      red = 0;
      green = 0;
      blue = 0;
      break;
  }
  pixel.setPixelColor(0, pixel.Color(red/divider, green/divider, blue/divider));
  pixel.show();
  #ifdef DEBUG
    Serial.print("LED: ");
    Serial.println(colour);
  #endif
}


/* init_lightSensor

   @brief   Initializes the RGB light sensor.
   @param   None
   @return  None
*/
void init_lightSensor(void)
{
  RGB_sensor.init();
}




/* init_pauseButton

   @brief   Initializes the pause button pin.
   @param   None
   @return  None
*/


void init_pauseButton(void)
{
  pinMode(pausePin, INPUT_PULLUP);
}

/* init_ISR

   @brief   Initializes the interrupt service routine.
   @param   None
   @return  None
*/
//void init_ISR(void) {
//
//  pinMode(interruptPin1, INPUT_PULLUP);
//  attachInterrupt(digitalPinToInterrupt(interruptPin1), ISR_buttonPressed, LOW);
//}

/* init_structures

   @brief   Initializes the nodePacket values.
   @param   None
   @return  None
*/
void init_structures()
{
  nodePacket.paused = 0;
  nodePacket.accelX_amp = 0;
  nodePacket.accelX_freq  = 0;
  nodePacket.accelY_amp  = 0;
  nodePacket.accelY_freq  = 0;
  nodePacket.accelZ_amp  = 0;
  nodePacket.accelZ_freq  = 0;
  nodePacket.light_R  = 0;
  nodePacket.light_G  = 0;
  nodePacket.light_B = 0 ;
  nodePacket.temp = 0 ;
  nodePacket.hum = 0;
}


/* updateThresholds

   @brief   Updates local thresholds from serverPacket (received).
   @param   None
   @return  None
*/
void updateThresholds(void)
{
  #ifdef SERIAL_OUTPUT
    Serial.println("Updating Thresholds");
  #endif
    Serial.print("node paused: \t\t");
    Serial.println(paused);
    Serial.print("server paused: \t\t");
    Serial.println(serverPacket.paused);
    paused = serverPacket.paused;
}

ACK. Do you have a scope or logic analyzer for debugging the bus?

You also could add and run from time to time the i2c_scanner code in order to find out which devices on the bus still respond.

I don't have a scope or logic analyzer, unfortunately. It appears as though all the i2c sensors 'go offline' at once. But once you power cycle the arduino, life is good again...

Could you give all the information to check if your I2C bus is reliable ?

Could you give links to the modules (links to where you bought them) and how they are powered, are there pullup resistors, what kind of wires or cables and length, a photo to check if there is a breadboard somewhere, and so on.

When something is wrong on the I2C bus, then there are sensors that keep SDA or SCL stuck to GND. You can measure the voltage with a multimeter. It will be hard to identify the sensor, because it might work if you test them one by one.

How are your Neopixels powered ?

Such sensors are not affected by the Reset button.

I am using:
https://www.digikey.ca/en/products/detail/sparkfun-electronics/SEN-12829/5673756
https://www.digikey.ca/en/products/detail/adafruit-industries-llc/3178/6098603
https://www.digikey.ca/en/products/detail/sparkfun-electronics/BOB-13282/6163655
https://www.digikey.ca/en/products/detail/adafruit-industries-llc/3251/6227008
https://www.digikey.ca/en/products/detail/adafruit-industries-llc/3886/10709725

The i2c lines are all tied to the SCL SDA lines on the feather. No pullup resistors, as these boards already have them.
The neopixel does have a 10k resistor in series from the pwm pin to the pwm input. Otherwise it is just powered from 3.3v and gnd.

It's just so strange that it works so well for several days then all of sudden doesnt.

That means that the effective pullup current is the sum of the currents through all pullups, perhaps too much for some device. You can measure that current flowing from the SCL/SDA to Gnd. If that current exceeds 3mA then remove some of the on-board pullups or jumpers.

Pullup resistors: 60k // 10k // (2 * 10k) // (2 * 10k) = 1935 Ω
Sink current: 3.3V / 1935 = 1.7mA (it should be below 3mA, so that is good).

The memory test does not tell if wrong memory is overwritten.
At first glance, I don't see a problem in your sketch.

Is it always after 7 days, or does sometimes happen after one day or 20 days ?

I haven't timed it exactly, but it seems to get stuck around the 1 week mark.
My next plan is to use a watchdog timer, and check if the sensor values are outside of "normal". If they are outside normal, will do a reset. Seems hacky, but I am not sure what else to check. Thanks for the suggestions so far though! Maybe it has something to do with the LoRa radio on board?