I2C flakiness: best strategy to identify and fix?

I have spent the last 8 months or so developing a custom USB game controller (USBcycle -- too complicated to describe fully here), consisting of 2 Leonardos connected via USB to an OSX (El Cap 10.11.6) gaming platform. Several I2C devices are involved, all controlled by the master (Leo1). Leo1 handles keyboard and mouse emulation, Leo2 handles joystick emulation.

The Leos also communicate with each other over I2C (using I2C_Anything library). Leo1 uses I2C to send a data structure to the slave (Leo2) and can also request the slave to send it back (with modifications of selected values if appropriate). Thus, the 2 Leos can pass data back and forth, sharing the values from various sensors: each writes only the fields corresponding to its own sensors.

This has all been working just fine for months (about five months, to be more precise). The project is done and documented, being exercised on a daily basis, and I'm starting to think about Version 2 at a pencil-and-paper level. There has been a long and encouraging period of perfect stability and reliability. But as of the last 2 weeks, I'm starting to see some worrying signs of flakiness on the I2C bus. I have not changed the hardware for months, have not changed the code for about a month, have not updated OSX for months, and yet the controller is becoming less reliable, failing intermittently.

Two observable things are different from a month or so ago. The first symptom was that a while back --
two months or more? -- Leo2 started to complain now and then about seeing no I2C bus power at startup. This complaint was very rare initially, but has been getting more frequent and as of tonight seems to be happening on almost half of all startups. It used to be that simply unplugging its USB cable and reconnecting it was sufficient to fix the problem; but tonight the "no I2C power" problem became chronic and I had to power-cycle both Leos and reload Leo2's sketch several times to fix it.

The second worrying change (possibly related) is that the Leo1->Leo2 data transfer seems to be failing. This was once unheard-of, then a month ago happened just once and mysteriously "fixed itself" after a couple of reboots; then tonight it happened persistently. Once it fails, the system does not recover (obviously I need better error handling!).

The symptom is that the game controller works great for a while (30 minutes, an hour, some random period), then suddenly the steering (a joystick axis) fails. The game no longer sees any input from that axis (but the other 2 joystick axes are still live), and a USB diagnostic tool no longer sees that axis either.

The steering mechanism relies on Leo1 reading a Hall rotary encoder (I2C device) and then updating the steering value in the shared data structure and sending the structure over I2C to Leo2; Leo2 then translates the steering value into a joystick value (as mentioned, it provides the joystick HID) and sends it over USB to the gaming platform. Quick instrumentation of the code with Serial.println shows that while Leo1 is having no trouble reading the encoder over I2C, Leo2 is no longer getting any data refreshes from Leo1 over I2C. However, rebooting both Leos a few times seems to restore functionality... for a while. After N minutes (an hour give or take 30 minutes) it fails again.

So it seems to me that something about the I2C bus is marginal (and getting rapidly worse), preventing the Leo1/Leo2 communication on which the steering axis depends; I have no idea how or why it's getting worse. This leaves me with many vexing questions.

I had a system that was working perfectly, and now it's increasingly flaky. Are I2C devices known for "going sour", and if so, what is most likely to be the issue? Do Leonardos have any known problems with I2C?

Why would I2C fail between one slave and the master (Leo2 doesn't get update from Leo1), yet continue to work between other slaves and the master (Leo1 still able to read I2C Hall sensor and control Trellis keypads)? I thought that a shared bus of this kind was "all or nothing". And yet I have seen, using debug statements, a failure mode in which the steering sensor value was being read and updated by Leo1, the "send to slave" code was invoked on Leo1, yet Leo2 never saw the update. If the I2C bus voltage were randomly dropping, I would expect all devices to fail intermittently, not just one specific transaction. No?

I'm tempted to believe that Leo2's GPIO is failing somehow, making it think it detects low I2C bus voltage when in fact it's OK; but that seems very unlikely. The other possibility is that its interrupt mechanism fails and that's why the data transfer is never completed. Here I am in way over my depth -- is there any hardware-related failure that could cause a Leonardo to stop honouring interrupts?

I am not an EE, have only been working with Arduino for 8 months (i.e. this is my first real project), and am not very confident about troubleshooting at this fairly advanced level. I do have an Xminilab and (sort of) know how to use it, but I'm not sure what I'd be looking for. A word or two of general "how to go about it" advice from older hands who have lots of I2C experience would be very welcome. Where do I start? Should I swap out Leo2 for a fresh Leonardo?

To answer you question we need some guidance with your code and schematics.

Your github repository.

The name 'current' for the sketch is confusing, 'current' is also an electrical current in ampere.
What about 'stable', 'development', 'latest', or something like that ?

Do you have a schematic ? I mean a real schematic made with Eagle or KiCad.

Do you power the AS5601 with 5V ?

Do you power the 7-segement display with 5V ?

Is there anything on the I2C bus that runs at 3.3V ? For example almost all sensors run at 3.3V.

Your test for I2C power with a digitalRead of SDA and SCL. That is okay, nothing wrong with that.
That means that the SDA or SCL level are still too low.
Check the 5V wiring.
You already have some delay in setup() before checking SDA and SCL. I think that delay should be enough.

Do you still use a breadboard ? They have often bad contacts. Solder everything or use the best quality connectors.

How long are the wires for SDA and SCL ? I mean the total length of every piece of wire for SDA and SCL. Keep it under 50 cm or lower the I2C speed with Wire.setClok(50000L);
Did you tie the SDA and SCL wires together ? They don't like each other (crosstalk).

Is Leo1 always the I2C Master and Leo2 always the I2C Slave ?
Does the Leo2 act as a Master to the display ? Is that a I2C display ? Then you would have a multi-master bus. That is yet unexplored territory. Don't be the first one who explores it, let others do that.

When requesting data from the Slave, you do not check if something was received. It is a Wire.requestFrom() followed by a I2C_ReadAnything. When sending data to the Slave, you don't check that either.
Check if the Wire.requestFrom() returns the same number of bytes that you requested.
Check the Wire.endTransmission() if it returns an error.

Is there something that turns off the interrupts ? For example in libraries.

The 'share_data' and 'gotData' variables are used in interrupts and in the loop(), therefor they must be made 'volatile'.
When reading or writing an element of 'share_data' in the loop, you have to turn off the interrupts (as short as possible). Since the Leonardo is a 8-bit chip, an interrupt may happen during reading a variable that is larger than one byte.
However, you don't have to turn off interrupts in the 'receiveEvent()' or 'requestEvent()', since interrupts are default not interrupted by other interrupts.

I'm only halfway investigating your code. There might be more. Don't be confused that it was getting worse. Such things happen. Perhaps the temperature changes or some minor other change. If you use a breadboard, it is most likely bad contacts. But at least add the 'volatile' keyword anyway.

@Koepel: wow! what a detailed, thoughtful and helpful response. I am impressed & grateful that you took the time to find the source, read it, and comment (responses below). Will have to reply in 2 chunks because the forum just rejected my reply as overlength. Will have to wait 5 minutes between part 1 and part 2. So stay tuned...

The name 'current' for the sketch is confusing, 'current' is also an electrical current in ampere.
What about 'stable', 'development', 'latest', or something like that ?

Point taken. I don't see any reason to keep the name; now that I'm using git it doesn't really need to be named "current" any more (previously was using the cheap and dirty method of renaming the source file each time I reached a devel milestone).

Do you have a schematic ? I mean a real schematic made with Eagle or KiCad.

Alas no. I did try to make one with Fritzing but quickly gave up as I found the UI so cumbersome. I did not even know that KiCAD existed prior to reading your post here, but I have DL'd it and will try it out. I would like eventually to make a proper custom board for this project so a real schematic would be worth putting some time into.

Do you power the AS5601 with 5V ?

Do you power the 7-segement display with 5V ?

Is there anything on the I2C bus that runs at 3.3V ? For example almost all sensors run at 3.3V.

Yes, yes, and no. The AS5601 is packaged on a handy "evaluation board" (alas no longer available) which may or may not include a level converter; the board specs call for +5. My linear Hall sensor is also packaged on a breakout board that takes +5.

All the stuff that Leo1 talks to is powered off Leo1's +5. All the stuff that Leo2 talks to is powered off Leo2's +5. Ground is shared. See next item for more thoughts on this...

Your test for I2C power with a digitalRead of SDA and SCL. That is okay, nothing wrong with that.
That means that the SDA or SCL level are still too low.
Check the 5V wiring.

It occurred to me in the wee hours, sleepless and going over my build looking for possible explanations, that maybe the I2C bus as I've wired it is a bit sketchy (so to speak). The 2 Leonardos are powered (separately) by their USB connections. I was told at some point months ago that their +5vdc should not be connected; ground should be common, but the +5 bus for each should be separate.

However, they both participate in the I2C bus, which is pulled high by their internal +5v when idle. Is that equivalent to connecting their +5vdc as I was told not to? I had not thought of this before. I don't know how, or if, the digital output pins on the Leo are isolated from vcc.

You already have some delay in setup() before checking SDA and SCL. I think that delay should be enough.

Good. I was aware (from bitter experience) that the Wire init takes a while. The init timing is actually rather carefully arranged, so that Leo1 doesn't look for Leo2 until the last possible moment (giving Leo2 time to init its own Wire protocol).

[end Part 1]

Part 2:

Do you still use a breadboard ? They have often bad contacts. Solder everything or use the best quality connectors.

No breadboards are involved in this version, but I am using 2 stock Leos with female headers, so there are quite a few jumper wires with header pins plugged into the Leos. This is another thought that occurred to me in the wee hours: the Leos are mounted upside-down under the control panel and the whole assembly does vibrate in use, so it's possible a pin is making not-quite-reliable contact. I will massage all the non-soldered connections.

How long are the wires for SDA and SCL ? I mean the total length of every piece of wire for SDA and SCL. Keep it under 50 cm or lower the I2C speed with Wire.setClok(50000L);
Did you tie the SDA and SCL wires together ? They don't like each other (crosstalk).

This is interesting. 50cm... 20 inches... Yes, I would say that if we added the total length of all SDA wires it is probably at least 15 inches (will have to do some actual measuring), and SCL must be about the same. I did not daisy-chain the I2C devices. They all connect to 4 buses inside the controller case: SCL, SDA, ground, and +5. So it is more of a star architecture, physically speaking.

The flying wires carrying I2C do run close to one another at times but are not actually tied into a tight bundle.

Is Leo1 always the I2C Master and Leo2 always the I2C Slave ?
Does the Leo2 act as a Master to the display ? Is that a I2C display ? Then you would have a multi-master bus. That is yet unexplored territory. Don't be the first one who explores it, let others do that.

Yes, Leo1 is always the I2C master. Leo2 is always a slave. Leo2 cannot initiate any transactions; it only responds (via interrupt) to requests from Leo1.

This is why the two Leos have to chat over the I2C bus; only Leo1 can read the steering sensor because the steering sensor is an I2C device and only Leo1 is bus master. But Leo2 needs the steering sensor value, because it's handling the joystick output of which steering is one axis. It's also reading the reed switches that tell us pedalling RPM, and doing the RPM calculation; but Leo1 is displaying the RPM on its I2C 7seg gizmo, so it needs Leo2 to tell it the RPM value. I really like being able to distribute the tasks and share data -- discovering I2C_Anything was one of the best moments of the project :slight_smile:

When requesting data from the Slave, you do not check if something was received. It is a Wire.requestFrom() followed by a I2C_ReadAnything. When sending data to the Slave, you don't check that either.
Check if the Wire.requestFrom() returns the same number of bytes that you requested.
Check the Wire.endTransmission() if it returns an error.

Interesting point, and my bad. I think I check that on the Leo2 side, with a "gotData" flag; and also in the receive interrupt routine, there's a test for the number of bytes received (does it match the size of the struct we are expecting). I thought I had duplicated those sanity tests on the Leo1 side but probably intended to do so and never got around to it.

Is there something that turns off the interrupts ? For example in libraries.

I considered this but as far as I could tell, I didn't need to. The only interrupt mechanism being used on the Leo2 side is the I2C request from Leo1; nothing else can interrupt it so there should be no competing interrupt that could disturb the I2C conversation. Leo1, meanwhile, never gets interrupted because it is the master (and does not use interrupts for any other purpose). So Leo1's timing can never be derailed; it talks to Leo2 only when it wants to.

The 'share_data' and 'gotData' variables are used in interrupts and in the loop(), therefor they must be made 'volatile'.
When reading or writing an element of 'share_data' in the loop, you have to turn off the interrupts (as short as possible). Since the Leonardo is a 8-bit chip, an interrupt may happen during reading a variable that is larger than one byte.
However, you don't have to turn off interrupts in the 'receiveEvent()' or 'requestEvent()', since interrupts are default not interrupted by other interrupts.

This is interesting. I will read up on "volatile," also on turning off interrupts. But if Leo2 turns off its interrupts for N ms then it would not respond to Leo1's request during that period; wouldn't that throw an error for Leo1? Must I really implement a whole backoff/retry algorithm? (jeepers, reinventing CSMA :-))

afaik no other interrupt could possibly occur on Leo2 during receiveEvent and requestEvent anyway (see above)...

I'm only halfway investigating your code. There might be more. Don't be confused that it was getting worse. Such things happen. Perhaps the temperature changes or some minor other change. If you use a breadboard, it is most likely bad contacts. But at least add the 'volatile' keyword anyway.

Temperatures have dropped (ambient outdoor) over 20 degrees over the last few days, but indoors it is far more stable. I don't think I can blame thermal stress for this one...

Re: volatile, will do.

One other thing occurred to me, which is the vexed question of pullup resistors for the I2C pins. I get more confused every time I read about it. The story that seems to emerge after a lot of googling is that the internal pullups on Arduini are too large (50K??) whereas I2C bus works best with lighter pullups like 4K. But my I2C bus has been working fine all this time with no added pullups ... o/c I am not sure what pullups may be installed in the 7seg display backpack, the eval board for the AS5601, or the Trellis keypad: this is something I should research.

Could I learn anything from a test as simple as putting a voltmeter between the SCL or SDA bus and ground, and checking how close it is to 5vdc?

Right now, I am rooting for a loose pin in Leo2's header -- the simplest, crudest, most obvious cause. But I will take seriously your error-handling and robustification suggestions above, and try to become a more sophisticated Arduinista. If you have any further thoughts on the 5vdc situation I would be very interested to hear more. Maybe it would be better to power the whole shebang off an external wall wart, with just one +5 rail for both Leos?

Once again, many thanks for taking the time to educate the EE-impaired!

You have done most things right. A few things got my attention.

The I2C Slave is most sensitive for turning off the interrupts. Therefor the DHT or OneWire or NeoPixels libraries might cause problems.
Turning interrupts off for a very short time to copy the volatile variables is no problem.
You can try without turning the interrupts off as most others do. Suppose the 'gotData' is checked fast enough in the loop(), and if that is before a call to requestEvent() or receiveEvent() which could change the variables in the struct, then it is okay.

With the total length of the wires for SDA and SCL I mean to add every piece of wire. A star shaped bus of 20" each to four devices makes 80".

The pullup resistors for SDA and SCL should be according to the I2C specifications, and that is maximum 3 mA pull down current.
Internal pullup resistors: use it only for testing.
10k pullup resistors: when a sensor is directly next to the Arduino board.
4k7 is a normal value.
The longer the wires the lower the pullup resistors should be.
You have to calculate the total value of all the pullup resistors on every Arduino board and every module.
Suppose two Leonardo boards with 50k internal each and two modules with 10k each. The total pullup resistor will be 1 / (1/50k + 1/50k + 1/10k + 1/10k) = 4k1. That makes a pull-down current of 1.2 mA.
Then you could add pullup resistors of 4k7 to 5V to create a pull-down current of 2.2 mA.

You can test the pull-down current of a I2C bus. Turn everything on, call Wire.begin(), but keep the I2C bus idle. Then with a multimeter measure the current (shortcut current) of SDA to GND and of SCL to GND.

When something on the I2C bus has no power, then the SDA and SCL are pulled low, and the bus does not work. When you connect the 5V pin of one Leonardo to the 5V pin of the other Leonardo board, the voltage regulator gets a reverse current. It should work in most cases but it can be dangerous.

Suppose you have a very strong 5V 10A power supply. If that is added to the 5V pin, the voltage regulator might blow, because of the large reverse peak. It has happened to some.
Suppose you have two Arduino boards connected to two computers. If the 5V pins are connected and one computer is turned off, then current would flow into the computer that is turned off. Not every computer does like that.

One power supply is the safest. That is one power supply of 5V, with the output split to two wires, going to the USB connectors of the Arduino boards. With the GNDs of the Arduino boards still connected with a wire of course.
Or a 7.5V power supply going to the barrel jack power input of both boards.

What about some kind of waiting loop to wait for the SDA and SCL to go HIGH ? With a message on the display or a blinking led. For example waiting for 30 seconds.

Always checking the 'howMany' parameter in the receiveEvent() is a good idea.
At the begin of the receiveEvent(), the 'howMany' is the same as Wire.available().

Don't use Serial functions in the receiveEvent() or requestEvent(). I don't care what you do with an error, but don't use them.
The receiveEvent() and requestEvent() are real interrupt routines. The Serial functions use interrupts themself. That can cause a lot of troubles. I know that it is in the official Arduino examples, but that is terrible. Those interrupt routines should be very short to keep the I2C communication fast and reliable.
Remove those Serial.print and Serial.println from those functions right now.

Fritzing does not produce "real" schematics. When you learn KiCad or Eagle, you already know 50% of the other one. It takes time to learn to use them. Well... not only time, also frustration and asking yourself why those programs make it hard instead of easy.

I have had no problems with the pins of shield into an Arduino board. Not even with bad quality shields. But others have, after a few years.

It sounds like a flaky connector. Corrosion and dirt builds up over time. If it is just flying wires going into the Arduino headers then it is very likely that one connection is bad. Carefully unplug and re-plug, one at a time.

It may be time to invest in a cheap USB oscilloscope. Then you can see wiring problems like inadequate pullups or echoes on the lines.

Koepel:
You have done most things right.

well that's nice to hear :slight_smile:

With the total length of the wires for SDA and SCL I mean to add every piece of wire. A star shaped bus of 20" each to four devices makes 80".

Just one more sanity check: should I count SDA and SCL separately when calculating total length, i.e. if I have just 2 devices with 8 inches of wire between'em, 2 wires, is that 16 inches (SDA plus SCL) or 8 inches? I am guessing the latter (length of SDA, length of SCL -- not length of SDA+SCL).

I will measure my star architecture but am pretty sure it will be in excess of 50cm. So I will look into the method for dropping the speed. This is hardly a time-critical app so I bet I can afford to throttle down the I2C bus quite a bit w/o any performance penalty.

The pullup resistors for SDA and SCL should be according to the I2C specifications, and that is maximum 3 mA pull down current.
Internal pullup resistors: use it only for testing.
10k pullup resistors: when a sensor is directly next to the Arduino board.
4k7 is a normal value.
The longer the wires the lower the pullup resistors should be.
You have to calculate the total value of all the pullup resistors on every Arduino board and every module.
Suppose two Leonardo boards with 50k internal each and two modules with 10k each. The total pullup resistor will be 1 / (1/50k + 1/50k + 1/10k + 1/10k) = 4k1. That makes a pull-down current of 1.2 mA.
Then you could add pullup resistors of 4k7 to 5V to create a pull-down current of 2.2 mA.

You are a little over my head there; I do know about Ohm's law, but I don't know about all the slave devices and what (if any) pullups they have. The message I take away is (a) I probably so need pullup resistors and (b) I need to learn how to determine the correct value (maybe practise on a simple breadboard setup before mucking with my working device).

You can test the pull-down current of a I2C bus. Turn everything on, call Wire.begin(), but keep the I2C bus idle. Then with a multimeter measure the current (shortcut current) of SDA to GND and of SCL to GND.

OK, I can make a minimalist version of the code that warms up the bus but never uses it, and try this. And all I have to do is set my multimeter to "milliamps" and put the probes across SDA/GND and then SCL/GND? I think I can handle that. If I get a number larger than 3 mA than that's bad, right? and since my bus is very long, I would ideally like a much lower number like 1.5 mA? I did read Nick Gammon's article on I2C pullups but again it was a bit over my head; what I remember was that he concluded that lower values (of pullup current) made for cleaner wave forms, up to a point, and that in his case 2.2K was about right.

I do have an Xminilab Portable which I barely know how to use (another learning curve ahead!), so it's possible that I can figure out how to watch the wave forms while the bus is in use and see what shape they are.

One power supply is the safest. That is one power supply of 5V, with the output split to two wires, going to the USB connectors of the Arduino boards. With the GNDs of the Arduino boards still connected with a wire of course.
Or a 7.5V power supply going to the barrel jack power input of both boards.

As of yesterday they are running on a shared wall wart providing 9v @ 2A. And I now realise that I need to remember to unplug them from USB before unplugging the wall wart (and the reverse at startup), otherwise they will revert to the (not so smart) USB power setup when the 9v is withdrawn. Is there any way, short of cutting traces, to tell the Arduino not to use USB power?

What about some kind of waiting loop to wait for the SDA and SCL to go HIGH ? With a message on the display or a blinking led. For example waiting for 30 seconds.

It did finally occur to me (doh!) that Leo2 might be checking the I2C bus just at the moment when Leo1 is busy doing the roll call of its slaves. If Leo2 checked at the wrong moment, the bus might be pulled low & it would report "no power". So yes, I put in a bit of a loop in the Leo2 code and maybe will make it longer yet, with many retries before it finally gives up. I suppose I could do a clunky hardware semaphore with a Leo1 port driving a Leo2 port, Leo1 holding the port high when it's busy talking to its slaves and dropping it when I2C is idle -- but gee, that seems like overkill.

Don't use Serial functions in the receiveEvent() or requestEvent(). I don't care what you do with an error, but don't use them.
The receiveEvent() and requestEvent() are real interrupt routines. The Serial functions use interrupts themself. That can cause a lot of troubles. I know that it is in the official Arduino examples, but that is terrible. Those interrupt routines should be very short to keep the I2C communication fast and reliable.
Remove those Serial.print and Serial.println from those functions right now.

OK, OK, I heard you :slight_smile: they are outta there! I set another volatile global var instead and check it when safely outside the interrupt code.

Fritzing does not produce "real" schematics. When you learn KiCad or Eagle, you already know 50% of the other one. It takes time to learn to use them. Well... not only time, also frustration and asking yourself why those programs make it hard instead of easy.

OK glad it's not just me then :slight_smile: -- I took an initial look at KiCAD and reeled back in dismay. So complicated, so unwieldy. Seems like an awful lot of learning curve to design a very simple board. However I realise I really do have to learn a PCB layout program of some kind eventually, so it may as well be KiCAD. At least it looks smarter & more flexible than Fritzing.

I gave background information, so you know about the limits. But you can relax 8), the length of wires and the 3mA are not hard limits.

There are Youtube videos how to start with KiCad, if you watch a few, you might be able to make your first schematic.
I use it only for drawing schematics, I have not even began with the PCB layout.

Doh! :o I totally forgot that when one Leonardo was communicating on the I2C bus, the other Leonardo could sample the SDA or SCL at the wrong moment and detect a low. Sorry about that :-[ I'm glad you came up with it. I suggest to remove that test completely.

My power supply gets a little hotter when I turn off my computer and current flows into the computer. In my case I don't care. A power supply of 7.5V is better (less voltage drop for the 5V voltage regulator). You can check with your finger if the voltage regulator on the Leonardo board gets too hot when you turn off the computer.

About the pullup resistors. The I2C specification is maximum 3 mA, but most components will still work when it is set to 10 mA. The best value to compensate for long wires is not 1.5 mA, but closer to 3 mA. It is to compensate for the capacitance to GND of the wires and a lower impedance I2C bus is less influenced by electrical noise.

Do you understand the calculation ?
Suppose the Leonardo boards have two internal pullup resistors of 50k, and two modules have 10k pullup resistors each. The total pullup value is those four parallel. It can be written as: 50k // 50k // 10k // 10k.
The value can be calculated in Google, try this: 1/(1/50000 + 1/50000 + 1/10000 + 1/10000) - Google Search

The length of the wires of 50 cm is just a rule of thumb for beginners. It is indeed not SDA + SCL, but for each seperately. It is when a GND wire is next to SDA or SCL or everything is put into a small box. If both the SDA and SCL wires are 99% just hanging around in the air, it is possible to go as far as a few meters. That is three times a few foot (I think) OCC vs Metric.
The length is limited by the capacitance of SDA and SCL to GND and by the crosstalk between SDA and SCL. The crosstalk is the worst, because lowering the clock speed will not help.

There is a handy test: MultiSpeed I2C Scanner - 50,100,200,400 KHz. - Libraries - Arduino Forum.
If 400 kHz is rock solid, then you can use 200 kHz without problems.

#Koepel many thanks for all the very helpful and relevant info. Yes, I have known since childhood the formula for resistors in parallel vs that for resistors in series... but that's about the extent of my grasp of circuit theory :slight_smile:

The conversation illustrates for me the finest aspect of the Arduino community -- high quality mentoring. I'm very grateful for the tutoring and will be investigating further, learning how to use my pocket scope, etc. This thread is now bookmarked permanently as I know I'll refer back to it more than once.

Thanks to MorganS too, I do have the cheap USB scope as mentioned, I just have to get comfortable using it. Takes some practise. So much stuff to learn, not enough time!

UPDATE: The controller worked flawlessly for a number of days, then suddenly crashed again with i2c issues. It now looks like Leo2 is no longer responding at all on the bus, despite numerous power cycles. So I took a deep breath and figured out how to get my xminilab portable into Logic Analyzer mode :slight_smile: Thanks to the xminilab I can see the failure with my own eyes. All other devices are talking to Leo1 correctly, and the protocol sniffer shows me that traffic (in hex bytes). I can see Leo1's attempts to contact Leo2 at the end of startup, sending to address 8; and I can see that Leo1 never receives any ACK. So I would say that the i2c bus as a whole is working fine, but Leo1 in particular is having a problem -- whether that be loose jumper wires or whatever it may be. I did notice that the voltage on the i2c bus when I tested it was a bit less than +5, like 4.8. I am not sure whether this is because the xminilab did some averaging of the signal on the bus, or whether the voltage is a bit droopy. Thanks again for the advice and encouragement... now I've got past most of the intimidation factor, I'm looking forward to using my pocket scope and improving my troubleshooting skills.

UPDATE 2: The debugging process was interesting; the i2c bus failed in various ways, changing over time (probably as I handled and moved the enclosure). At one point Leo1 could no longer talk to any of its slaves. I decided to suspect Leo2 since the trouble began there, and unplugged it from the i2c bus, at which point everyone else came back online and bus traffic was normal. But when I plugged Leo2 back in, guess what -- everyone including Leo2 was now communicating successfully. My conclusion -- I'll be more sure of this if I can repeat this "fix" -- is that the issue is with the mechanical connection of jumpers to the F header on Leo2; the duinos are mounted upside down so gravity is assisting the pins in loosening, and the entire bike vibrates continuously in use, which may explain why everything works great for N hours of use and then goes sideways (as one or both connections start to get sketchy).

It may be that the jumpers (cheap ebay stuff) have slightly undersized pins, or...? but at any rate, at this point it looks like the bus itself (despite the absence of pullup resistors, p/s decoupling caps, etc) works just fine, but Leo2 has a specific problem with its physical connection. As we used to say in my working days, "Theory of Wire." So MorganS may get the prize for Right Answer on this thread. In the next version of this controller I think I will use a protoboard shield (these seem to be a good tight solid fit into the headers on the duino) and solder the wiring harness to that, rather than the very prototypish (and apparently unreliable) flying-jumpers method I used for the beta version. I have found that putting a slight bend in jumper pins sometimes improves their grip on the socket, so for now I'll try that.

For those building their first moderately complex Thingy, I should mention a few regrets I have about the beta build, lessons learned:

  • I should have brought out test points for crucial features such as vcc, ground, SDA, SCL, accessible from the outside of the enclosure. These test points should have been designed for easy/secure contact for scope and multimeter probes.
  • I should never have relied on USB power for 2 linked Arduinos in one enclosure each with its own USB connection. A single wall wart or other external p/s was the way to go.
  • I should have provided a socket each for pullup resistors for SDA and SCL, so I could try out various resistor values w/o having to unmount the enclosure from its place on the bike.
  • I should probably have put a nice big capacitor from VIN to ground (according to my recent reading).
  • I should have separated, as far as possible, the wires for SDA and SCL, to avoid crosstalk.
  • And because the enclosure in actual use is mounted on the bike handlebars, it doesn't sit conveniently on a workbench and a conventional multimeter is almost useless -- you need three hands! -- so I should have invested in a pen multimeter long ago (got one on order) and I need to make a velcro/elastic wrist band for my Xminilab Portable as well for hands-free use.

I hope that these coulda-woulda-shouldas are instructive for others making their way up the learning curve.