Hello,
I built a prototype using an Arduino Mega with an OLED display, EEPROM, and temperature sensors on the same I2C bus. Everything works for a while, but after a few hours the bus hangs and the display stops updating.
I’m using 4.7k pull-ups and around 25 cm wires between boards.
Questions:
What is the best way to recover a stuck I2C bus in software?
Could bus capacitance be causing this issue?
If I move this design to a PCB, should I keep SDA/SCL traces short and route them away from SPI signals?
Yes .... the standard recovery method for a stuck I2C bus is to manually clock the bus:
Reconfigure SDA/SCL as GPIO
Pulse SCL around 10 times to release any stuck slave
Generate a STOP condition (SDA high while SCL is high)
Reinitialize the I2C peripheral
Keep SDA/SCL short and tightly routed together
Route away from SPI, PWM, motor, or switching signals
Use a solid ground plane under the traces, I would recommend to use seperate gnd and power planes, read this guide to understand planes: Routing Layers and Ground Planes & Power Planes - Engineering Technical - PCBway
Consider stronger pull-ups (2.2k–3.3k) depending on bus capacitance
Optional: add small series resistors (22–100ohm) near the MCU to reduce ringing
Your issue is almost certainly signal integrity (capacitance + noise), not I2C protocol limits. On a clean PCB layout, this kind of bus hang will be eliminated.
But the suggestion of stronger pullup resistors is also good and easy to test.
If a rogue I2C device is hogging the bus then it may not be easy to solve.
Post an annotated schematic, that sounds like a hardware problem possibly in the power system. Try running with the display disconnected, I expect the problem will go away. If you must see the data use serial.print(). What type of area is this in, home, office, ..etc.
If a slave is stuck holding the bus low... can you depower the bus? I've had some success running the I2C bus powered from a GPIO, so that if the bus becomes truly stuck or broken I depower EVERYTHING on the bus for a period of time, then repower and reinitialize the bus. But... I admit that was just a stop-gap I tried and seemed to work (and requires powering everything from a GPIO pin, which works fine for very low-current-draw systems on the bus, but obviously is a problem for high-current-draw ones).
If there are other functioning devices on the bus, that could cause them to lock-up and it will not necessarily reset the stuck device.
Assuming all devices on the bus will reset when completely depowered (including the device supply power), then that is the only fullproof way to reset everything.
One of the most common issues for a "locked up" i2c bus when using an AVR processor is due to the Wire library hanging from it not properly handling multi master.
Even if you are not using multiple masters the multi master code in the low level Wire library still has support for it and can get confused and lock up.
Where this can become problematic is that if the Wire library "thinks" there is another master on the bus wanting to talk because of noise on the bus, it will back off and wait for that other master to finish - which never happens.The way that is implemented in the low level code is that the Wire s/w spins down in the library code waiting for the Wire h/w to post an event indicating that the other master is finished.
But... if there is noise on the bus when there is a single master, it can confuse the Wire library code into thinking that there is another master on the bus. So it will wait, for that other master to finish. But since there is no other master, the h/w will never post an event to indicate that the other master finished, and the Wire s/w will spin forever since there is no timeout in that spin wait loop. This will hang the system.
There are two approaches.
Use watchdog timer (this is the best approach)
If the Wire library hangs, the watchdog will fire and reset the board,
The Arduino sketch code will start over and everything will be initialized and start over just like a power cycle or hardware reset.
If using newer 2.x IDE and Wire library, enable Wire library timeouts:
Search for how to use them. " arduino avr wire library enable timeouts"
Basically, you enable the timeouts and you can call a Wire method to see if a timeout occurred.
There was LOTs of discussion about this before it was implemented.
I agreed with the need to fix the forever spins, but was very much against the way it was implemented.
There are quite a few potential issues with using these timeouts, beyond the portability issues.
For example, In order to be robust to get things working again, you will need to fully re-initialize the Wire library and every i2c slave on the bus.
Some Wire slave libraries may have issues when their begin() function is called again. This is why I suggest using the watchdog timer.
So you blindly have added 2 * 4k7 ?
Did you calculate total resistance of all connected modules, including the 2*10k that the Mega already has.
Total R should not be below 1k66 (3mA).
Leo..
It seems to happens quite a bit.
The AVR TWI / WIre implementation is half h/w half s/w.
There are several situations that can cause to the twi code to lock up due to the code incorrectly transitioning to state and getting stuck in spin loops polling for an event that will never happen.
It could be triggered by various things, induced noise, poor or missing external pullups (the Wire library enables the internal pullups which are not really strong enough to work properly all the time), power supply issues, etc...
There was quite a bit of discussion about it a few years back in the forum and in a few github issues.
One thread back in the 2020 time frame that really kicked off interest in finally resolving this issue, was some goofy person was wanting to use an Arduino to make a ventilator but the Wire library would occasionally lock up.
One of the cases is the twi s/w could get stuck waiting for a STOP than never happens.
Enough people finally seeing these lockups is what drove the addition of the timeouts to the low level twi code.
Then it's a hardware issue not a software issue. If your bus is highly susceptible to noise, then you need to rethink your design. You can’t expect software to counteract the effects a poorly designed system. From what I have seen on the forum, the most common causes of I2C problems are due to people using long wires, no pullups, incorrect pullups, incorrect pullup voltage or a misunderstand of how their devices work.
Granted in many cases the trigger for this issue is h/w, like wiring or power.
But the real issue is that a problem in the AVR TWI system is causing the processor to hang.
It is very common in the real world for s/w to work around h/w issues.
The TWI system has the ability to return error status codes when the s/w detects issues. The problem is the TWI system didn't have timeouts for all situations.
The AVR TWI system should not have cases where it can get stuck and hang the processor.
On the AVR TWI system, it is part h/w and part s/w to keep the h/w simple and inexpensive.
In order for a system to be robust, it must never hang, and should be able to recover from as many exceptions and errors as possible.
The AVR TWI system has/had a problem. Under certain scenarios, it causes a lockup in the low level TWI s/w because the s/w never times out waiting on certain h/w events, This can be fixed in s/w- and has, by enabling the missing s/w timeouts - but these additional timeouts are not enabled by default.
Consider this scenario. There actually are multiple masters, the other master starts a transaction but fails to complete it because it just happened to lose power.
If the timing was just right, the AVR master could fall into a spin loop waiting forever for that other master to finish but it will never happen.
I also have a type of I2C LCD device whose chipset misbehaves if you read from it and triggers one of the TWI lockup issues. I put code in my LCD library for this type of device that prevents reads to prevent a lockup on AVR systems .
These are just a couple of the many situations where the AVR can get locked up because the TWI s/w state machine can spin forever waiting for a h/w state change than never happens. If the additional timeouts are enabled, the Wire library won't lock up anymore and the user code can take whatever actions it deems necessary to re-initalize the Wire system and slaves.
Alternatively as I mentioned, the user could simply enable the watchdog timer and then, should one of these TWI issues happen, the AVR is reset and starts over cleanly.