I2C flakiness: best strategy to identify and fix?

Part 2:

Do you still use a breadboard ? They have often bad contacts. Solder everything or use the best quality connectors.

No breadboards are involved in this version, but I am using 2 stock Leos with female headers, so there are quite a few jumper wires with header pins plugged into the Leos. This is another thought that occurred to me in the wee hours: the Leos are mounted upside-down under the control panel and the whole assembly does vibrate in use, so it's possible a pin is making not-quite-reliable contact. I will massage all the non-soldered connections.

How long are the wires for SDA and SCL ? I mean the total length of every piece of wire for SDA and SCL. Keep it under 50 cm or lower the I2C speed with Wire.setClok(50000L);
Did you tie the SDA and SCL wires together ? They don't like each other (crosstalk).

This is interesting. 50cm... 20 inches... Yes, I would say that if we added the total length of all SDA wires it is probably at least 15 inches (will have to do some actual measuring), and SCL must be about the same. I did not daisy-chain the I2C devices. They all connect to 4 buses inside the controller case: SCL, SDA, ground, and +5. So it is more of a star architecture, physically speaking.

The flying wires carrying I2C do run close to one another at times but are not actually tied into a tight bundle.

Is Leo1 always the I2C Master and Leo2 always the I2C Slave ?
Does the Leo2 act as a Master to the display ? Is that a I2C display ? Then you would have a multi-master bus. That is yet unexplored territory. Don't be the first one who explores it, let others do that.

Yes, Leo1 is always the I2C master. Leo2 is always a slave. Leo2 cannot initiate any transactions; it only responds (via interrupt) to requests from Leo1.

This is why the two Leos have to chat over the I2C bus; only Leo1 can read the steering sensor because the steering sensor is an I2C device and only Leo1 is bus master. But Leo2 needs the steering sensor value, because it's handling the joystick output of which steering is one axis. It's also reading the reed switches that tell us pedalling RPM, and doing the RPM calculation; but Leo1 is displaying the RPM on its I2C 7seg gizmo, so it needs Leo2 to tell it the RPM value. I really like being able to distribute the tasks and share data -- discovering I2C_Anything was one of the best moments of the project :slight_smile:

When requesting data from the Slave, you do not check if something was received. It is a Wire.requestFrom() followed by a I2C_ReadAnything. When sending data to the Slave, you don't check that either.
Check if the Wire.requestFrom() returns the same number of bytes that you requested.
Check the Wire.endTransmission() if it returns an error.

Interesting point, and my bad. I think I check that on the Leo2 side, with a "gotData" flag; and also in the receive interrupt routine, there's a test for the number of bytes received (does it match the size of the struct we are expecting). I thought I had duplicated those sanity tests on the Leo1 side but probably intended to do so and never got around to it.

Is there something that turns off the interrupts ? For example in libraries.

I considered this but as far as I could tell, I didn't need to. The only interrupt mechanism being used on the Leo2 side is the I2C request from Leo1; nothing else can interrupt it so there should be no competing interrupt that could disturb the I2C conversation. Leo1, meanwhile, never gets interrupted because it is the master (and does not use interrupts for any other purpose). So Leo1's timing can never be derailed; it talks to Leo2 only when it wants to.

The 'share_data' and 'gotData' variables are used in interrupts and in the loop(), therefor they must be made 'volatile'.
When reading or writing an element of 'share_data' in the loop, you have to turn off the interrupts (as short as possible). Since the Leonardo is a 8-bit chip, an interrupt may happen during reading a variable that is larger than one byte.
However, you don't have to turn off interrupts in the 'receiveEvent()' or 'requestEvent()', since interrupts are default not interrupted by other interrupts.

This is interesting. I will read up on "volatile," also on turning off interrupts. But if Leo2 turns off its interrupts for N ms then it would not respond to Leo1's request during that period; wouldn't that throw an error for Leo1? Must I really implement a whole backoff/retry algorithm? (jeepers, reinventing CSMA :-))

afaik no other interrupt could possibly occur on Leo2 during receiveEvent and requestEvent anyway (see above)...

I'm only halfway investigating your code. There might be more. Don't be confused that it was getting worse. Such things happen. Perhaps the temperature changes or some minor other change. If you use a breadboard, it is most likely bad contacts. But at least add the 'volatile' keyword anyway.

Temperatures have dropped (ambient outdoor) over 20 degrees over the last few days, but indoors it is far more stable. I don't think I can blame thermal stress for this one...

Re: volatile, will do.

One other thing occurred to me, which is the vexed question of pullup resistors for the I2C pins. I get more confused every time I read about it. The story that seems to emerge after a lot of googling is that the internal pullups on Arduini are too large (50K??) whereas I2C bus works best with lighter pullups like 4K. But my I2C bus has been working fine all this time with no added pullups ... o/c I am not sure what pullups may be installed in the 7seg display backpack, the eval board for the AS5601, or the Trellis keypad: this is something I should research.

Could I learn anything from a test as simple as putting a voltmeter between the SCL or SDA bus and ground, and checking how close it is to 5vdc?

Right now, I am rooting for a loose pin in Leo2's header -- the simplest, crudest, most obvious cause. But I will take seriously your error-handling and robustification suggestions above, and try to become a more sophisticated Arduinista. If you have any further thoughts on the 5vdc situation I would be very interested to hear more. Maybe it would be better to power the whole shebang off an external wall wart, with just one +5 rail for both Leos?

Once again, many thanks for taking the time to educate the EE-impaired!