Quadrature Encoder Readings Highly Irregular After Working; No Apparent Cause

Howdy ya'll,

I have attached links to my parts and a wiring diagram as well as code and sample serial monitor output. I’m using an Arduino to power two 25D HP Pololu motors through the MC33926 motor shield. Several days ago, the serial monitor readout started giving me irregular readings from the encoders. This was strange, because our code has been working perfectly for months, but given that we’ve been rough with the craft and possibly wires, I thought maybe there was a disconnect or damaged part. The thing is… I’ve replaced literally every hardware component for a benchtop test and the readings are still irregular. A code which was giving 5 readings per second smoothly is now unable to give me a consistent output, regardless if its at 200 milliseconds, 500, or 1000. In short, I'm stuck, I've spent two full days troubleshooting this in a meticulous fashion, and I am now getting increasingly frustrated with the main progress point of finishing my PhD.

Things I have tried:

-Replacing USB cable (no change)
-Replacing Arduino (no change, tried three new ones)
-Replacing Motor Shied (no change)
-Replacing all wires, testing continuity and shorts with voltmeter (all passed and no changes)
-Replacing motors with two new ones, fresh out of the box, first individually and then both (no changes)
-Pulling the yellow encoder wire (feedback goes haywire without this reading but same problem persists)
-Pulling white encoder wires (the serial moniter prints remain identical with white wires attached or unattached. This is the only behavior which could prove useful.)
-Pulling encoder power (simply stops prints to serial monitor as encoders are not running)
-The sample output I tested both with and without the timestamp. It did not seem to have an impact.
-We've had issues on a different project with printing too many serial lines; I commented out most items with no impact.

Again, let me emphasize: this code was performing perfectly at 200ms sample rate for months! A few weeks ago this code was working flawlessly. Then the encoder readings started sampling irregularly. And the code which was working now is showing this same error on an entirely new system of parts. Which leads me to believe it's the code... which doesn't make sense. The irregularity is highly apparent, we would have noticed it before.

Sample Output:

Dual MC33926 Motor Shield
RPM1: -63 RPM2: -63 k1 V: 90 k2 V: 90 M1 current: 108 M2 current: 108
RPM1: -63 RPM2: -63 k1 V: 89 k2 V: 90 M1 current: 108 M2 current: 108
RPM1: -63 RPM2: -63 k1 V: 89 k2 V: 90 M1 current: 108 M2 current: 108
RPM1: -63 RPM2: -63 k1 V: 89 k2 V: 90 M1 current: 108 M2 current: 108
RPM1: -63 RPM2: -63 k1 V: 89 k2 V: 90 M1 current: 108 M2 current: 108
RPM1: -59 RPM2: -63 k1 V: 89 k2 V: 90 M1 current: 108 M2 current: 108
RPM1: -63 RPM2: -63 k1 V: 90 k2 V: 88 M1 current: 108 M2 current: 108
RPM1: -63 RPM2: -63 k1 V: 90 k2 V: 88 M1 current: 108 M2 current: 108
RPM1: -63 RPM2: -63 k1 V: 90 k2 V: 88 M1 current: 108 M2 current: 108
RPM1: -63 RPM2: -63 k1 V: 90 k2 V: 88 M1 current: 108 M2 current: 108
RPM1: -63 RPM2: -59 k1 V: 90 k2 V: 88 M1 current: 108 M2 current: 108
RPM1: -63 RPM2: -63 k1 V: 90 k2 V: 89 M1 current: 108 M2 current: 108
RPM1: -59 RPM2: -63 k1 V: 88 k2 V: 89 M1 current: 108 M2 current: 108
16:26:37.849 -> RPM1: 0 RPM2: 0 k1 V: 89 k2 V: 87 M1 current: 144 M2 current: 144
16:26:38.324 -> RPM1: -75 RPM2: -63 k1 V: 126 k2 V: 99 M1 current: 108 M2 current: 108
16:26:38.904 -> RPM1: -67 RPM2: -75 k1 V: 93 k2 V: 103 M1 current: 108 M2 current: 108
16:26:40.196 -> RPM1: -63 RPM2: -63 k1 V: 94 k2 V: 91 M1 current: 108 M2 current: 108
16:26:41.583 -> RPM1: -63 RPM2: -63 k1 V: 92 k2 V: 89 M1 current: 108 M2 current: 108
16:26:42.462 -> RPM1: -67 RPM2: -63 k1 V: 93 k2 V: 89 M1 current: 108 M2 current: 108
16:26:43.238 -> RPM1: -63 RPM2: -63 k1 V: 90 k2 V: 90 M1 current: 108 M2 current: 108
16:26:44.905 -> RPM1: -63 RPM2: -63 k1 V: 90 k2 V: 90 M1 current: 108 M2 current: 108
16:26:48.059 -> RPM1: -63 RPM2: -63 k1 V: 90 k2 V: 90 M1 current: 108 M2 current: 108
16:26:48.703 -> RPM1: -63 RPM2: -63 k1 V: 90 k2 V: 90 M1 current: 108 M2 current: 108
16:26:49.991 -> RPM1: -63 RPM2: -63 k1 V: 90 k2 V: 90 M1 current: 108 M2 current: 108

Code in followup post due to character limit. I am using an Arduino Uno. Links to motors and shield here:

//*************************************************************/
//Numbers in this code are for 28V input to motor controller/shield
#include <Wire.h>
#include <SPI.h>
#include <DualMC33926MotorShield.h>

#define encodPinA1      2                       // encoder A pin motor 1
#define encodPinB1      6                       // encoder B pin motor 1
#define encodPinA2      3                       // encoder A pin motor 2
#define encodPinB2      5                       // encoder A pin motor 2

//These working controller gains are for CounterRotating B150_20RPM
float Kp =  0.2;                                // PID proportional control Gain
float Kd =  0.4;                                // PID Derivitave control gain
float Ki =   0;
int speed_req = 60;
int error_sum = 0;
float pidTerm = 0;
int error = 0;
int last_error = 0;
int k = 50;
int k_new = 0;
int k1 = k;
int k2 = k;
int t1 = 0;
int t2 = 0;
int t3 = 0;
int t4 = 0;
volatile long count1 = 0;                        // rev counter
volatile long count2 = 0;
int LOOPTIME = 50;
int LOOPTIME2 = 200;
int speed_act1 = 0;                              // speed (actual value)
int speed_act2 = 0;
int current = 0;                                // in mA
int dt = 0;

DualMC33926MotorShield md;

void setup () {
  //Set up Motor Controller
  Serial.begin(115200);
  Serial.println("Dual MC33926 Motor Shield");
  md.init();

  t1 = millis();

  //Set up Encoders
  pinMode(encodPinA1, INPUT);
  pinMode(encodPinB1, INPUT);
  digitalWrite(encodPinA1, HIGH);                      // turn on pullup resistor
  digitalWrite(encodPinB1, HIGH);
  attachInterrupt(0, rencoder1, FALLING);

  pinMode(encodPinA2, INPUT);
  pinMode(encodPinB2, INPUT);
  digitalWrite(encodPinA2, HIGH);                      // turn on pullup resistor
  digitalWrite(encodPinB2, HIGH);
  attachInterrupt(1, rencoder2, FALLING);
  initial_k();

}
//Gains for B150 and Counter rotating
void initial_k()
{
  if (speed_req == 20)
  { k1 = 50;
    k2 = 50;
  }

  if (speed_req == 40)
  { k1 = 80;
    k2 = 80;
  }

  if (speed_req == 60)
  { k1 = 90;
    k2 = 90;
  }

  if (speed_req == 80)
  { k1 = 170;
    k2 = 170;
  }

  if (speed_req == 100)
  { k1 = 199;
    k2 = 199;
  }



}

void loop () {
  //Motor A forward
  k1 = constrain(k1, 0, 229);
  k2 = constrain(k2, 0, 229);
  md.setM1Speed(k1);

  // put k1 negative for counter rotating and positive for crabwalking.  This moves to the right for crabwalking if craft is facing forward.

  //Motor B backward
  md.setM2Speed(k2);

  t2 = millis() ;
  t4 = millis();
  if (t2 - t1 >= LOOPTIME) {
    getMotorData1();
    getMotorData2();
    if (current > 100) {
      displayData();
      k1 = updatePid(k1, speed_req, abs(speed_act1));
      k2 = updatePid(k2, speed_req, abs(speed_act2));

    }
    t1 = t2;
  }

}
void rencoder1()  {                                    // pulse and direction, direct port reading to save cycles
  if (digitalRead(encodPinB1) == HIGH)   count1 ++;             // if(digitalRead(encodPinB1)==HIGH)   count ++;
  else                      count1--;                // if (digitalRead(encodPinB1)==LOW)   count --;
}

void rencoder2()  {                                    // pulse and direction, direct port reading to save cycles
  if (digitalRead(encodPinB2) == HIGH)   count2 ++;            // if(digitalRead(encodPinB1)==HIGH)   count ++;
  else                      count2--;                // if (digitalRead(encodPinB1)==LOW)   count --;
}

void getMotorData1()  {                                                        // calculate speed, volts and Amps
  static long countAnt1 = 0;                                                   // last count
  t2 = millis();
  dt = t2 - t1;
  speed_act1 = ((count1 - countAnt1) * (60 * (1000 / dt))) / (48 * 75) * 4.22222; // 48 pulses X 75 gear ratio = 3600 counts per output shaft rev
  current = md.getM1CurrentMilliamps();                                       // motor curren
  count1 = 0;
}
void getMotorData2()  {                                                        // calculate speed, volts and Amps
  static long countAnt2 = 0;                                                   // last count
  t2 = millis();
  dt = t2 - t1;
  speed_act2 = ((count2 - countAnt2) * (60 * (1000 / dt))) / (48 * 75) * 4.22222; // 48 pulses X 75 gear ratio = 3600 counts per output shaft rev
  current = md.getM2CurrentMilliamps();                                       // motor current
  count2 = 0;
}

void displayData() {

  if (t4 - t3 >= LOOPTIME2) {
    Serial.print("RPM1: ");
    Serial.print(speed_act1);
    Serial.print("\t");
    Serial.print("RPM2: ");
    Serial.print(speed_act2);
    //    Serial.print("\t");
    //    Serial.print("k1 V: ");
    //    Serial.print(k1);
    //    Serial.print("\t");
    //    Serial.print("k2 V: ");
    //    Serial.print(k2);
    //    Serial.print("\t");
    //    Serial.print("M1 current: ");
    //    Serial.print(current);
    //    Serial.print("\t");
    //    Serial.print("M2 current: ");
    //    Serial.print(current);
    Serial.println();
    t3 = t4;
  }
}

int updatePid(int number_k, int targetValue, int currentValue)   {             // compute PWM value
  error = abs(targetValue) - abs(currentValue);
  pidTerm = (Kp * error) + (Kd * (error - last_error)) + error_sum * Ki;
  last_error = error;
  error_sum += error;
  return constrain(number_k + int(pidTerm), 0, 399);
}

Disable interrupts on access to count1 and count2, because these variables are updated in the ISRs. Volatile is correct but not sufficient to implement atomic access.

Thank you for your response. Could you explain what you mean a bit more? I'm unfamiliar with the term "atomic access" and can only find a little information on this thread: Demonstration: Atomic access and Interrupt routines - Programming Questions - Arduino Forum

I also don't understand what you mean by "Disable interrupts on access to count1 and count2". The variables are updated in the interrupt service routine and I believe they have the be updated in an ISR. Are you saying use a noInterrupts(), interrupts() block around the rencoder1 and rencoder2 functions? Wouldn't that defeat the whole point of them being ISR?

I'm very confused, sorry.

You must protect count1 from being corrupted by an interrupt, while it is being accessed by the main program.

Instead of this:

 speed_act1 = ((count1 - countAnt1) * (60 * (1000 / dt))) / (48 * 75) * 4.22222; // 48 pulses X 75 gear ratio = 3600 counts per output shaft rev

make a copy of any variable shared with an interrupt, before using it:

noInterrupts();
long count1_copy = count1;
interrupts();
  speed_act1 = ((count1_copy - countAnt1) * (60 * (1000 / dt))) / (48 * 75) * 4.22222; // 48 pulses X 75 gear ratio = 3600 counts per output shaft rev

If the problem appeared suddenly as you describe, it is unlikely that the above will fix it. You should check the encoder outputs for proper function using an oscilloscope.

By the way, you lose a lot of precision by sloppy programming. Quantities like (1000/dt) suffer from severe truncation errors during integer division. Do the multiplication before the division, using long integers if necessary, e.g.

(60L*1000)/dt is much more accurate than

60*(1000/dt) when dt is an int.

Hi,

Can you please post a copy of your circuit, in CAD or a picture of a hand drawn circuit in jpg, png?
A the fritzy that you linked is not a schematic and it does not include your power supplies.

Have you really got the encoder V+ connected to 5V on one motor and 3.3V on the other?
I would think both encoders should be connected to 5V, to ensure a RELIABLE 5V signal to the UNO.

Thanks.. Tom.. :slight_smile:

jremington:
You must protect count1 from being corrupted by an interrupt, while it is being accessed by the main program.

Instead of this:

 speed_act1 = ((count1 - countAnt1) * (60 * (1000 / dt))) / (48 * 75) * 4.22222; // 48 pulses X 75 gear ratio = 3600 counts per output shaft rev

make a copy of any variable shared with an interrupt, before using it:

noInterrupts();

long count1_copy = count1;
interrupts();
  speed_act1 = ((count1_copy - countAnt1) * (60 * (1000 / dt))) / (48 * 75) * 4.22222; // 48 pulses X 75 gear ratio = 3600 counts per output shaft rev




If the problem appeared suddenly as you describe, it is unlikely that the above will fix it. You should check the encoder outputs for proper function using an oscilloscope.

**By the way**, you lose a lot of precision by sloppy programming. Quantities like (1000/dt) suffer from severe truncation errors during integer division. Do the multiplication before the division, using long integers if necessary, e.g.

(60L*1000)/dt is much more accurate than

60*(1000/dt) when dt is an int.

Even if it does not immediately solve the problem thank you for pointing out this best practice/trick. I will definitely be applying it. I believe my lab has an oscilloscope, I will observe the output and see what it looks like. Also, that's a good point about the multiplication. I will reformat the code.

TomGeorge:
Hi,

Can you please post a copy of your circuit, in CAD or a picture of a hand drawn circuit in jpg, png?
A the fritzy that you linked is not a schematic and it does not include your power supplies.

Have you really got the encoder V+ connected to 5V on one motor and 3.3V on the other?
I would think both encoders should be connected to 5V, to ensure a RELIABLE 5V signal to the UNO.

Thanks.. Tom.. :slight_smile:

I'm unsure what else I can display. The only additions I could make are the power supply which supplies 28V to the center of the motor shield and the USB cable which powers the Arduino.

Now that you point out the encoder split... yes. I think this was a temporary fix we employed at one point to get the system to work, it actually worked, and we left it and forgot it. But this could easily be a root cause to the problem. The response on the Pololu forum was to immediately pounce on this as well. Perhaps this is one of those "it shouldn't have worked in the first place" scenarios and my first board was pulling 5V for that encoder through some weird shorting shenanigans all along.

Thanks to everyone. I have some solutions to implement and try. I will report back results tomorrow.

Quick Edit: Wow. I just checked the motor specs and the encoder required voltage is 3.5V-20V. It should not have been working on the 3.3V at all. I'm curious if the 3.3V shorted with the 5V and simply acted as a splicer for a time before wearing out. This definitely seems like a "shouldn't have ever worked in the first place" issue. I'll do a 5V header split and report back. They only draw 10mA max.

Did a split off of the 5V for the encoders; no change in performance. Implemented the Interrupt() block as indicated above around the counts; no change in performance. Seeing a weird pattern: if I remove power to one or both encoders, the sampling performance increases significantly.

With the common 5V power through verified breadboard:

17:22:36.220 -> Dual MC33926 Motor Shield
17:22:38.111 -> RPM1: 0	RPM2: 0
17:22:38.921 -> RPM1: -88	RPM2: -75
17:22:40.034 -> RPM1: -63	RPM2: -71
17:22:40.235 -> RPM1: -71	RPM2: -67
17:22:41.386 -> RPM1: -67	RPM2: -71
17:22:42.770 -> RPM1: -67	RPM2: -67
17:22:43.041 -> RPM1: -67	RPM2: -67
17:22:44.865 -> RPM1: -67	RPM2: -67
17:22:45.676 -> RPM1: -63	RPM2: -63
17:22:46.387 -> RPM1: -59	RPM2: -59
17:22:47.337 -> RPM1: -67	RPM2: -63
17:22:47.743 -> RPM1: -63	RPM2: -67
17:22:51.740 -> RPM1: -63	RPM2: -63

Note there was one timestamp in there, 40.034 to 40.235, which was the correct sample rate. Now look what happens if I disconnect the power to the encoders but continue to read the pins:

17:15:04.657 -> Dual MC33926 MotDual MC33926 Motor Shield
17:15:12.120 -> RPM1: 0	RPM2: 0
17:15:12.422 -> RPM1: 0	RPM2: 0
17:15:12.859 -> RPM1: 0	RPM2: 0
17:15:13.060 -> RPM1: 0	RPM2: 0
17:15:13.299 -> RPM1: 0	RPM2: 0
17:15:13.567 -> RPM1: 0	RPM2: 0
17:15:14.310 -> RPM1: 0	RPM2: 0
17:15:14.511 -> RPM1: 0	RPM2: 0
17:15:14.715 -> RPM1: 0	RPM2: 0
17:15:15.150 -> RPM1: 0	RPM2: 0
17:15:15.554 -> RPM1: 0	RPM2: 0
17:15:15.827 -> RPM1: 0	RPM2: 0
17:15:16.199 -> RPM1: 0	RPM2: 0
17:15:16.704 -> RPM1: 0	RPM2: 0
17:15:17.010 -> RPM1: 0	RPM2: 0
17:15:17.311 -> RPM1: 0	RPM2: 0
17:15:17.613 -> RPM1: 0	RPM2: 0
17:15:17.815 -> RPM1: 0	RPM2: 0
17:15:18.016 -> RPM1: 0	RPM2: 0

I'm at a loss. A fellow labmate suggested I use the Timer library here: Arduino Playground - Timer Library

If anyone has suggestions for using this, or perhaps if I misused millis() and it has started acting up for whatever reason, I'm all ears. I started off thinking this was hardware but stranger things have happened. I'm going to go through the code again, line by line, and see if I can spot a way to improve it and make it more robust today. My labmate pointed out that I AM getting proper values for the RPM and the tick counts, that it's unlikely to be hardware because everything appears working.

I don't know why the bug would only show up now... I'm at my wits end.

int t1 = 0;
int t2 = 0;
int t3 = 0;
int t4 = 0;

These time variables are picking up the value of millis(). They should be typed as
unsigned long.

Did you try:

pinMode(encodPinA1, INPUT_PULLUP);
pinMode(encodPinB1, INPUT_PULLUP);

?

I solved my problem. And this was actually a great lesson to learn. I examined the error from the perspective of two possible causes:

  1. The hardware changed, i.e. a loose wire or faulty motor, encoder, etc.

  2. The code mistakenly changed, or interaction with the computer thereof, i.e. a mistakenly deleted set of brackets, poor data type interactions, computer updates, etc.

Once I ruled out loose wires or any hardware, and replaced the entire physical system only to find the same error, I assumed it was then mistaken software. I revisited old versions of software, only to find the same error. I had not examined a third possible cause of the error:

  1. A significant but unapparent change in environment-hardware interaction causes portions of the code, particularly loops, to behave differently.

A snippet of code is wrapped around my data display and PID feedback to prevent the repeated display of lines before we turn the craft on to run:

if (t2 - t1 >= LOOPTIME) {
    getMotorData1();
    getMotorData2();
    if (current > 100) {
      displayData();
      k1 = updatePid(k1, speed_req, abs(speed_act1));
      k2 = updatePid(k2, speed_req, abs(speed_act2));
    }
    t1 = t2;
  }

This snippet essentially says "don't print the line or engage PID if current is below 100mA." Which was exactly my problem. In retrospect, I'm not sure why such a high level of current was selected. The "never should have worked in the first place" aspect comes from the fact that we continually ran this in a higher load capacity, typically around 200-300mA. That's why the issues never occurred. When we moved to a new mode of testing, we ran the craft in much more favorable conditions, apparently around 100mA load. That's why they mimicked a loose wire. Upon benchtop testing attached to the craft that erratic reading continued (because of wheel load) and thats why during benchtop testing unattached I could barely pick up readings, and the results became even worse when testing on brand new motors: they were now unworn and more efficient.

When output appears to change for no reason, even attached to brand new hardware, the software may still be working as intended... just not in the way you intended.

TLDR: Examine your assumptions and operational loops, dummy.