I2C bus randomly fails but restarts when peripheral monitor opens

So I have an arduino nano (and I tested an Arduino Uno in this situation, which has similar behavior, though the post is mostly about the Nano, and I've tested a baremetal ATMega328p, all exhibiting this behavior, so I'm assuming it's software related(?)) acting as an I2C slave that is occasionally having the I2C bus fail, I have no idea why it's failing as such, but I'll be putting several messages towards the Arduino nano and eventually the I2C bus stops receiving or sending data. I have no idea why. At first the data is fine, but potentially up to 10 minutes later it stops receiving data (confirmed by serial plotter, which it can fail during the use of).

I have checked the TWSR register, which returns 0xF8, which doesn't show up in the datasheet as a valid error code, so I have no idea what is happening.
I do have pull-up resistors, it is pulled up to 5v via 2 4.7k resistors, one for SDA and one for SDL.
I am actively able to see that the device is being sent I2C data via a logic analyzer.
The I2C master is operating on 100kHz standard mode speed, so that is also not the issue.

The strangest part of all of this is that I can open up the serial plotter and the error just will clear out, this happens while sending data over UART and while I'm not sending data over UART, my code does not receive any data over UART so that can't be the issue.
At this point it has been a couple of days of this, and I have no idea what the origin of this issue is or why it's affecting literally everything now.
This code also worked for over a year without modification, so I'm wondering what could be the cause of this now.
Thank you for your help if at all possible.

Following the forum guidelines can you post your code and hopefully an annotated schematic as to how it is connected?

Sorry about that,
Here is an attached series of images that describe a simplified variant of the setup, the exact setup is a bit more complicated, but this is effectively it, I know someone is likely to complain about the Raspberry Pi 4 not technically being 5v, but it's inputs are actually 5v tolerant in these cases, and you can see via my captured waveform that this does in fact have an output.
If you want to tell me about that, this stackoverflow user actually has done a lot more work to ensure he knows what's going on than me:

This is a simplified version of the code that I have,


int I2C_ADDRESS = 0x8;

void setup()
{
  Wire.setClock(100000);
  Wire.begin(I2C_ADDRESS);
  Wire.onRequest(requestEvent);
  analogReference(DEFAULT);
  
  pinMode(A0, INPUT);
  pinMode(A1, INPUT);
  pinMode(A2, INPUT);
  pinMode(A3, INPUT);
  
  pinMode(2, INPUT);
  pinMode(3, INPUT);
  pinMode(4, INPUT);
  pinMode(5, INPUT);
}

int i = 0;


void requestEvent()
{
  Wire.write( byte(I2C_ADDRESS) );
  uint16_t val = (uint16_t)(analogRead(A0));
  #define WRITE_SHORTMULTI(value, func) func(byte(value)); func(byte(value >> 8))
  WRITE_SHORTMULTI(val, Wire.write);
  val = (uint16_t)(analogRead(A1));
  WRITE_SHORTMULTI(val, Wire.write);
  val = (uint16_t)(analogRead(A2));
  WRITE_SHORTMULTI(val, Wire.write);
  val = (uint16_t)(analogRead(A3));
  WRITE_SHORTMULTI(val, Wire.write);

  Wire.write( byte(digitalRead(2)) );
  Wire.write( byte(digitalRead(3)) );
  Wire.write( byte(digitalRead(4)) );
  Wire.write( byte(digitalRead(5)) );
  #undef WRITE_SHORTMULTI
}

void loop() {}

I forgot to add the Raspberry Pi Side code, this is an incredible simple variant based off of an entire GUI program, but this is what is doing the code reading does, the .1 sleep is used and isn't used, doing a minimal reproduction of the code here:

import smbus
import time
I2CBus = smbus.SMBus(1)

def get_data():
    data = I2CBus.read_i2c_block_data(addr, 0x00)
    time.sleep(.1)
    return data
my_data = []
while True:
     my_data.append(get_data(), time.time())
# process data later, I know this above loop is infinite but it's a minimal repro of what my code is doing

Some 3.3V processors have a undocumented safety feature for 5V signals. I don't know if it is wise to make use of a undocumented feature :thinking:

Can you show a photo ?
Do you use wires or a cable ? and how is the GND connected ?

The I2C address 0x08 might be seen as something special. You can start at 0x10 or 0x20.

The Arduino Nano has a timeout function, but I don't know what it does in Slave mode: https://www.arduino.cc/reference/en/language/functions/communication/wire/setwiretimeout/

There is an other and better solution. The Raspberry Pi runs linux, so the best choice is to connect the Arduino Nano via the USB connector and use Serial/UART data over the USB bus. Run a sketch on the Nano and a Python script on the Raspberry Pi.

It ran for well over a year without any issues, so unless linux updating in the meantime caused this to fail, I doubt that it is fundamentally the issue, especially because the Raspberry Pi is still sending out I2C signal requests, as demonstrated by the waveform, AND I have determined via extensive testing that the Arduino itself stops seeing these requests, via simply asking it to tell me if it doesn't see an I2C request after 2 seconds and just continuously sending i2c requests. I am gauranteeing it's not the Raspberry Pi as I've encountered other Pis that broke on this project, but none of the reasons they've broken have been related to this.

As for the 0x08 thing:

0x08 is completely valid, that isn't the issue, lower is the lowest you can go, doubly so since this will work fine for 20 minutes being failing at some points, so that can't be the issue.

There is an other and better solution. The Raspberry Pi runs linux, so the best choice is to connect the Arduino Nano via the USB connector and use Serial/UART data over the USB bus. Run a sketch on the Nano and a Python script on the Raspberry Pi.

This is a system with a (fairly) complicated setup that I'd rather not run more cables in because that will cause more headaches in the future, especially since device discovery is supposed to happen over the I2C bus in the first place by interrogating what device it is (which was a different set of code, also demonstrating the same issue), and the fact that I'm supposed to have multiple devices on the same bus, you can't do that with UART sadly.

Can you show a photo ?
Do you use wires or a cable ? and how is the GND connected ?

The nano slots into a PCB setup that slots onto a Raspberry Pi Hat, it's a set of custom PCBs that are redirecting the signals appropriately, as well as allowing power through certain sections of the board stack, the PCBs themselves have been unchanged for 2 years, I doubt that it is fundamentally the cause of this issue. The ground is thus good (these PCBs are actually designed in-house, so we know what's going on), and the

I have a disgusting workaround of attempting to pull the reset-pin low and then high again and then letting the entire system restart while in the middle of a data-recording, but that's fundamentally disgusting and just papering over the issue instead of actually fixing anything.

Does that mean that the SDA and SCL are not going into a cable ? That is good.

With all the information that I have now, I'm thinking about three things:

  • a timing problem
  • a memory bug in the code
  • contacts that have gone bad.

That means you have to check every line of code and all the hardware.

The I2C bus can work for years and then suddenly fail. The reason is that it sometimes barely works.
A memory problem in the code and linger on for years without showing itself. If something from the outside changes, it might suddenly pop up.
Taking the shields apart and then put them together again might solve a bad contact.

Do you know that we have a saying on this forum: "The problem is in the part that you are not showing". We have to see the code that goes wrong.
You might be using a library that turns off the interrupts: DHT, Neopixel, OneWire, and other libraries. That has a big influence on the response time of the I2C bus.
You might have a onRequest handler that is more complex.
Perhaps the 0.1 seconds is not enough ?

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.