I have spent the last 8 months or so developing a custom USB game controller (USBcycle -- too complicated to describe fully here), consisting of 2 Leonardos connected via USB to an OSX (El Cap 10.11.6) gaming platform. Several I2C devices are involved, all controlled by the master (Leo1). Leo1 handles keyboard and mouse emulation, Leo2 handles joystick emulation.
The Leos also communicate with each other over I2C (using I2C_Anything library). Leo1 uses I2C to send a data structure to the slave (Leo2) and can also request the slave to send it back (with modifications of selected values if appropriate). Thus, the 2 Leos can pass data back and forth, sharing the values from various sensors: each writes only the fields corresponding to its own sensors.
This has all been working just fine for months (about five months, to be more precise). The project is done and documented, being exercised on a daily basis, and I'm starting to think about Version 2 at a pencil-and-paper level. There has been a long and encouraging period of perfect stability and reliability. But as of the last 2 weeks, I'm starting to see some worrying signs of flakiness on the I2C bus. I have not changed the hardware for months, have not changed the code for about a month, have not updated OSX for months, and yet the controller is becoming less reliable, failing intermittently.
Two observable things are different from a month or so ago. The first symptom was that a while back --
two months or more? -- Leo2 started to complain now and then about seeing no I2C bus power at startup. This complaint was very rare initially, but has been getting more frequent and as of tonight seems to be happening on almost half of all startups. It used to be that simply unplugging its USB cable and reconnecting it was sufficient to fix the problem; but tonight the "no I2C power" problem became chronic and I had to power-cycle both Leos and reload Leo2's sketch several times to fix it.
The second worrying change (possibly related) is that the Leo1->Leo2 data transfer seems to be failing. This was once unheard-of, then a month ago happened just once and mysteriously "fixed itself" after a couple of reboots; then tonight it happened persistently. Once it fails, the system does not recover (obviously I need better error handling!).
The symptom is that the game controller works great for a while (30 minutes, an hour, some random period), then suddenly the steering (a joystick axis) fails. The game no longer sees any input from that axis (but the other 2 joystick axes are still live), and a USB diagnostic tool no longer sees that axis either.
The steering mechanism relies on Leo1 reading a Hall rotary encoder (I2C device) and then updating the steering value in the shared data structure and sending the structure over I2C to Leo2; Leo2 then translates the steering value into a joystick value (as mentioned, it provides the joystick HID) and sends it over USB to the gaming platform. Quick instrumentation of the code with Serial.println shows that while Leo1 is having no trouble reading the encoder over I2C, Leo2 is no longer getting any data refreshes from Leo1 over I2C. However, rebooting both Leos a few times seems to restore functionality... for a while. After N minutes (an hour give or take 30 minutes) it fails again.
So it seems to me that something about the I2C bus is marginal (and getting rapidly worse), preventing the Leo1/Leo2 communication on which the steering axis depends; I have no idea how or why it's getting worse. This leaves me with many vexing questions.
I had a system that was working perfectly, and now it's increasingly flaky. Are I2C devices known for "going sour", and if so, what is most likely to be the issue? Do Leonardos have any known problems with I2C?
Why would I2C fail between one slave and the master (Leo2 doesn't get update from Leo1), yet continue to work between other slaves and the master (Leo1 still able to read I2C Hall sensor and control Trellis keypads)? I thought that a shared bus of this kind was "all or nothing". And yet I have seen, using debug statements, a failure mode in which the steering sensor value was being read and updated by Leo1, the "send to slave" code was invoked on Leo1, yet Leo2 never saw the update. If the I2C bus voltage were randomly dropping, I would expect all devices to fail intermittently, not just one specific transaction. No?
I'm tempted to believe that Leo2's GPIO is failing somehow, making it think it detects low I2C bus voltage when in fact it's OK; but that seems very unlikely. The other possibility is that its interrupt mechanism fails and that's why the data transfer is never completed. Here I am in way over my depth -- is there any hardware-related failure that could cause a Leonardo to stop honouring interrupts?
I am not an EE, have only been working with Arduino for 8 months (i.e. this is my first real project), and am not very confident about troubleshooting at this fairly advanced level. I do have an Xminilab and (sort of) know how to use it, but I'm not sure what I'd be looking for. A word or two of general "how to go about it" advice from older hands who have lots of I2C experience would be very welcome. Where do I start? Should I swap out Leo2 for a fresh Leonardo?