I have attached links to my parts and a wiring diagram as well as code and sample serial monitor output. I’m using an Arduino to power two 25D HP Pololu motors through the MC33926 motor shield. Several days ago, the serial monitor readout started giving me irregular readings from the encoders. This was strange, because our code has been working perfectly for months, but given that we’ve been rough with the craft and possibly wires, I thought maybe there was a disconnect or damaged part. The thing is… I’ve replaced literally every hardware component for a benchtop test and the readings are still irregular. A code which was giving 5 readings per second smoothly is now unable to give me a consistent output, regardless if its at 200 milliseconds, 500, or 1000. In short, I'm stuck, I've spent two full days troubleshooting this in a meticulous fashion, and I am now getting increasingly frustrated with the main progress point of finishing my PhD.
Things I have tried:
-Replacing USB cable (no change)
-Replacing Arduino (no change, tried three new ones)
-Replacing Motor Shied (no change)
-Replacing all wires, testing continuity and shorts with voltmeter (all passed and no changes)
-Replacing motors with two new ones, fresh out of the box, first individually and then both (no changes)
-Pulling the yellow encoder wire (feedback goes haywire without this reading but same problem persists)
-Pulling white encoder wires (the serial moniter prints remain identical with white wires attached or unattached. This is the only behavior which could prove useful.)
-Pulling encoder power (simply stops prints to serial monitor as encoders are not running)
-The sample output I tested both with and without the timestamp. It did not seem to have an impact.
-We've had issues on a different project with printing too many serial lines; I commented out most items with no impact.
Again, let me emphasize: this code was performing perfectly at 200ms sample rate for months! A few weeks ago this code was working flawlessly. Then the encoder readings started sampling irregularly. And the code which was working now is showing this same error on an entirely new system of parts. Which leads me to believe it's the code... which doesn't make sense. The irregularity is highly apparent, we would have noticed it before.
Disable interrupts on access to count1 and count2, because these variables are updated in the ISRs. Volatile is correct but not sufficient to implement atomic access.
I also don't understand what you mean by "Disable interrupts on access to count1 and count2". The variables are updated in the interrupt service routine and I believe they have the be updated in an ISR. Are you saying use a noInterrupts(), interrupts() block around the rencoder1 and rencoder2 functions? Wouldn't that defeat the whole point of them being ISR?
make a copy of any variable shared with an interrupt, before using it:
noInterrupts();
long count1_copy = count1;
interrupts();
speed_act1 = ((count1_copy - countAnt1) * (60 * (1000 / dt))) / (48 * 75) * 4.22222; // 48 pulses X 75 gear ratio = 3600 counts per output shaft rev
If the problem appeared suddenly as you describe, it is unlikely that the above will fix it. You should check the encoder outputs for proper function using an oscilloscope.
By the way, you lose a lot of precision by sloppy programming. Quantities like (1000/dt) suffer from severe truncation errors during integer division. Do the multiplication before the division, using long integers if necessary, e.g.
Can you please post a copy of your circuit, in CAD or a picture of a hand drawn circuit in jpg, png?
A the fritzy that you linked is not a schematic and it does not include your power supplies.
Have you really got the encoder V+ connected to 5V on one motor and 3.3V on the other?
I would think both encoders should be connected to 5V, to ensure a RELIABLE 5V signal to the UNO.
make a copy of any variable shared with an interrupt, before using it:
noInterrupts();
long count1_copy = count1;
interrupts();
speed_act1 = ((count1_copy - countAnt1) * (60 * (1000 / dt))) / (48 * 75) * 4.22222; // 48 pulses X 75 gear ratio = 3600 counts per output shaft rev
If the problem appeared suddenly as you describe, it is unlikely that the above will fix it. You should check the encoder outputs for proper function using an oscilloscope.
**By the way**, you lose a lot of precision by sloppy programming. Quantities like (1000/dt) suffer from severe truncation errors during integer division. Do the multiplication before the division, using long integers if necessary, e.g.
(60L*1000)/dt is much more accurate than
60*(1000/dt) when dt is an int.
Even if it does not immediately solve the problem thank you for pointing out this best practice/trick. I will definitely be applying it. I believe my lab has an oscilloscope, I will observe the output and see what it looks like. Also, that's a good point about the multiplication. I will reformat the code.
TomGeorge:
Hi,
Can you please post a copy of your circuit, in CAD or a picture of a hand drawn circuit in jpg, png?
A the fritzy that you linked is not a schematic and it does not include your power supplies.
Have you really got the encoder V+ connected to 5V on one motor and 3.3V on the other?
I would think both encoders should be connected to 5V, to ensure a RELIABLE 5V signal to the UNO.
Thanks.. Tom..
I'm unsure what else I can display. The only additions I could make are the power supply which supplies 28V to the center of the motor shield and the USB cable which powers the Arduino.
Now that you point out the encoder split... yes. I think this was a temporary fix we employed at one point to get the system to work, it actually worked, and we left it and forgot it. But this could easily be a root cause to the problem. The response on the Pololu forum was to immediately pounce on this as well. Perhaps this is one of those "it shouldn't have worked in the first place" scenarios and my first board was pulling 5V for that encoder through some weird shorting shenanigans all along.
Thanks to everyone. I have some solutions to implement and try. I will report back results tomorrow.
Quick Edit: Wow. I just checked the motor specs and the encoder required voltage is 3.5V-20V. It should not have been working on the 3.3V at all. I'm curious if the 3.3V shorted with the 5V and simply acted as a splicer for a time before wearing out. This definitely seems like a "shouldn't have ever worked in the first place" issue. I'll do a 5V header split and report back. They only draw 10mA max.
Did a split off of the 5V for the encoders; no change in performance. Implemented the Interrupt() block as indicated above around the counts; no change in performance. Seeing a weird pattern: if I remove power to one or both encoders, the sampling performance increases significantly.
With the common 5V power through verified breadboard:
Note there was one timestamp in there, 40.034 to 40.235, which was the correct sample rate. Now look what happens if I disconnect the power to the encoders but continue to read the pins:
If anyone has suggestions for using this, or perhaps if I misused millis() and it has started acting up for whatever reason, I'm all ears. I started off thinking this was hardware but stranger things have happened. I'm going to go through the code again, line by line, and see if I can spot a way to improve it and make it more robust today. My labmate pointed out that I AM getting proper values for the RPM and the tick counts, that it's unlikely to be hardware because everything appears working.
I don't know why the bug would only show up now... I'm at my wits end.
I solved my problem. And this was actually a great lesson to learn. I examined the error from the perspective of two possible causes:
The hardware changed, i.e. a loose wire or faulty motor, encoder, etc.
The code mistakenly changed, or interaction with the computer thereof, i.e. a mistakenly deleted set of brackets, poor data type interactions, computer updates, etc.
Once I ruled out loose wires or any hardware, and replaced the entire physical system only to find the same error, I assumed it was then mistaken software. I revisited old versions of software, only to find the same error. I had not examined a third possible cause of the error:
A significant but unapparent change in environment-hardware interaction causes portions of the code, particularly loops, to behave differently.
A snippet of code is wrapped around my data display and PID feedback to prevent the repeated display of lines before we turn the craft on to run:
This snippet essentially says "don't print the line or engage PID if current is below 100mA." Which was exactly my problem. In retrospect, I'm not sure why such a high level of current was selected. The "never should have worked in the first place" aspect comes from the fact that we continually ran this in a higher load capacity, typically around 200-300mA. That's why the issues never occurred. When we moved to a new mode of testing, we ran the craft in much more favorable conditions, apparently around 100mA load. That's why they mimicked a loose wire. Upon benchtop testing attached to the craft that erratic reading continued (because of wheel load) and thats why during benchtop testing unattached I could barely pick up readings, and the results became even worse when testing on brand new motors: they were now unworn and more efficient.
When output appears to change for no reason, even attached to brand new hardware, the software may still be working as intended... just not in the way you intended.
TLDR: Examine your assumptions and operational loops, dummy.