Arduino Randomly Freezing During Long Jobs

Hi,

I've built a 6-DOF micro pick 'n' place robot which I will use to assist me in the creation of various digital artworks. My problem is that the machine runs beautifully for many hours, but then freezes for some unknown reason. This stalling always happens after the previous command has been carried out cleanly, and can occur anywhere (according to my last 20+ logfiles) between 29 seconds and 15 hours into the job.

I originally blamed the VB serial port for this (I left a huge thread on the interfacing section of this forum), but after successfully running the code for over three days with just my Arduino UNO parsing instructions via USB from the laptop, I decided it must be an electronics problem. I designed my stepper board using LochMaster, which is great for quick prototyping, but doesn't produce a pretty schematic view so forgive this ugly graphic of my setup.

This is my first time building such a contraption, so I'm hoping that the cause of the freezing will be obvious to someone out there. My main X/Y supply is a 20v 4A Laptop supply, and the 7v for my tool module comes from a 1.3A switching wallwart. And before you tell me those supplies are squarely to blame, I can still get the machine to stall with both of them unplugged and just logic going into the board via the left-hand connector. Also, I took the precaution of disabling BOD on the ATmega328p, but it made no difference.

Thanks for your time,

Thomas

Spikes from the motors are probably affecting the AVR. Your layout looks very poor - you need to improve the power and ground distribution.

Two obvious possibilities:

  1. Transients (e.g. from the motors) upsetting the Arduino. See previous reply.

  2. A problem in the code. You haven't posted your code so I can only make general suggestions. Are you using the String class, or doing anything else that involves dynamic memory allocation? If so, don't. Is your code robust with respect to unexpected inputs? Is the RAM usage comfortably within the available RAM?

I don't see enough bypass capacitors mounted on that board layout drawing. If that is true your board is truly acting like a innocent young girl walking unknowingly into the red-light district of real world EMI and digital switching noise. Good engineering practice would have .1ufd caps mounted very close to the Vcc and ground pins of each and every chip and one or two larger (20-100ufd) caps across the Vcc and ground entry points to the board. Don't let Grumpy Mike see that picture. :wink:

Lefty

Thanks for the suggestions!

I'd suspected that there wasn't enough smoothing going on, so last night I started a long test with a 470?F cap across the 5V/GND on the Arduino, which seemed to work better (ie. it stalled after nearly 15.5hrs).

@dc42 - I have no String class in my code, it works fine without my driver board attached and it's well within my available RAM. Thanks for asking.

I understand the comments about motor transients, but I'm still a bit worried that a test I did earlier with only the driver board's 5V/GND connected to the Arduino still managed to fail after a few hours, despite the complete absence of switching loads and motor supplies. I just can't understand how that would happen, especially as there was no other power going to the board.

I will add caps to all my ICs today, and see if that helps...

All chips have decoupling caps, and I've put a 22uF across the 5v. Test failed after 3hrs 24m...

So, here's my code, though I'm sure it's okay (I'll put the xyzservos handler in the next post)

//
// MMM_matrix_Plotter3a (pairs with MP3 New VB)
// tjn 20/11/2012
// 
// Status: Interleaves X, Y1, Y2 and Z(B)
// Note: 78 Steps/mm using B servo in Full-wave mode, slack = 36

#include <Wire.h>
#include <iox.h>

// Setup pins (0 and 1 reserved for serial I/O)
byte xctrl1 = 2;                // X-Axis geartrain control 1
byte xctrl2 = 3;                // X-Axis geartrain control 2
byte yctrl1 = 4;                // South-Y geartrain control 1
byte yctrl2 = 5;                // South-Y geartrain control 2
byte y2ctrl1 = 7;               // North-Y geartrain control 1
byte y2ctrl2 = 8;               // North-Y geartrain control 2
byte tpower = 9;                // Trolley stepper supply
byte tctrl1 = 10;               // Trolley control 1
byte tctrl2 = 11;               // Trolley control 2
byte xypower = 12;              // XY stepper supply
byte lpower = 13;               // Laser logic supply
byte xsensors = 15;             // X-buffer switches
byte ysensors = 14;             // Y-buffer switches
byte scaleFactor = 1;
byte offset1, offset2, space, cut, gap;
byte xStepIdx, y1StepIdx, y2StepIdx, zStepIdx;
int xSteps, y1Steps, y2Steps, zSteps, totalSteps, subSteps, bitIndex, laserTime, tMult;
int xStore, yStore, zStore, pixCount, iterations, dotCount, max1, max2, iTimeIdx = 0;
char inStr[40];                 // Hold incoming data
byte index = 0;
boolean stringComplete = false; // Data complete flag
boolean xStop = false;
boolean yStop = false;
boolean xDir = false;           // TRUE = Clockwise
boolean yDir = false;
boolean zDir = false;
boolean oxDir = false;
boolean oyDir = false;
boolean ozDir = false;
boolean laserOn = false;
long MsDelay;
unsigned char twoWire[] = {
  B01,B11,B10,B00};             // 2-Wire sequence for X/Y steppers
word fullWaveB[] = {            // Full-wave Slave stepper motor sequence
  0x6000,0x2010,0x18,0x4008};
word lampState = 0x0000;
double m, mx, my1, my2, mz, x, y1, y2, z;

void setup() {
  Serial.begin(38400);         // was 38400
  pinMode(xctrl1, OUTPUT);    // X stepper pin1
  pinMode(xctrl2, OUTPUT);    // X stepper pin2
  pinMode(yctrl1, OUTPUT);    // Y1 stepper pin1
  pinMode(yctrl2, OUTPUT);    // Y1 stepper pin2
  pinMode(y2ctrl1, OUTPUT);   // Y2 stepper pin1
  pinMode(y2ctrl2, OUTPUT);   // Y2 stepper pin2
  pinMode(tctrl1, OUTPUT);    // Trolley pin1
  pinMode(tctrl2, OUTPUT);    // Trolley pin2
  pinMode(lpower, OUTPUT);    // Laser power
  pinMode(xypower, OUTPUT);   // XY stepper power
  pinMode(tpower, OUTPUT);    // Trolley power
  pinMode(xsensors, INPUT);   // X-buffer switches
  pinMode(ysensors, INPUT);   // Y-buffer switches
  digitalWrite(xypower, LOW); // Power-down XY motors
  digitalWrite(lpower,LOW);   // Power-down laser
  digitalWrite(tpower,LOW);   // Power-down trolley
  Wire.begin();               // Start 2-wire communications (Arduino as master device)
  IOX.device(0x74, 16);       // 0x74 is address for Servo A (Pitch)
  IOX.write(0x0080, CFGPORT); // P07=INPUT Set ports LOW to make them OUTPUTS
  IOX.write(0x0000, INVPORT); // Set slave device invert ports to all NON-INVERT
  IOX.write(0x000, OUTPORT);  // Power-down Lamp/Fan
  Serial.println("OK?");
  delay(100);
}

void loop() {
  if (stringComplete) {
    xSteps = atoi(strtok(inStr, "xy"));   // X Transit
    y1Steps = atoi(strtok(NULL, "z"));    // Y Transit
    zSteps = atoi(strtok(NULL, "l"));     // Z Transit
    laserTime = atoi(strtok(NULL, ",o")); // Laser On/Off
    offset1 = atoi(strtok(NULL, "s"));    // Offset1
    space = atoi(strtok(NULL, "c"));      // Space
    cut = atoi(strtok(NULL, "g"));        // Cut
    gap = atoi(strtok(NULL, "i"));        // Gap
    iterations = atoi(strtok(NULL, "o")); // Iterations
    offset2 = atoi(strtok(NULL, ""));     // Offset2

    if (xSteps < 0) xDir = true;
    if (xSteps > 0) xDir = false;
    if (xSteps == 0) xDir = oxDir;
    if (y1Steps < 0) yDir = true;
    if (y1Steps > 0) yDir = false;
    if (y1Steps == 0) yDir = oyDir;
    if (zSteps < 0) zDir = false;
    if (zSteps > 0) zDir = true;
    if (zSteps == 0) zDir = ozDir;
    if (xStop = true && xDir != oxDir) xStop = false;
    if (yStop = true && yDir != oyDir) yStop = false;
    xSteps = abs(xSteps);
    y1Steps = abs(y1Steps);
    zSteps = abs(zSteps);
    y2Steps = y1Steps;


    if (zDir != ozDir) { // Vertical Slack Handler
      zSteps += 37;      // Z-Slack value (from laser deflection test)
    }

    totalSteps = max(xSteps, y1Steps);
    subSteps = min(xSteps, y1Steps);
    m = (double)subSteps/(double)totalSteps;

    if (m > 0.7) { // vector splitter/dog-legger to dodge bad harmonics
      digitalWrite(lpower,LOW);
      if (xSteps > y1Steps) {
        xStore = xSteps;
        yStore = y1Steps;
        xSteps = xStore - y1Steps;
        y1Steps = 0;
        y2Steps = 0;
        digitalWrite(xypower, HIGH);
        xyzServos();
        xSteps = yStore;
        y1Steps = yStore;
        y2Steps = y1Steps;
      }
      if (xSteps < y1Steps) {
        xStore = xSteps;
        yStore = y1Steps;
        y1Steps = yStore - xSteps;
        xSteps = 0;
        digitalWrite(xypower, HIGH);
        xyzServos();
        xSteps = xStore;
        y1Steps = xStore;
        y2Steps = y1Steps;
      }
      // else not used (no adjustment needed when X & Y are equal!  
    }

    if (xSteps != 0 || y1Steps != 0 || zSteps != 0) {
      digitalWrite(xypower, HIGH);
      xyzServos(); // rem-out while testing
    }
    else {
      if (laserTime == 1) {
        digitalWrite(lpower, HIGH); // Laser ON
        IOX.write(0x0200, OUTPORT); // Lamp & Fan ON
      }
      else{
        digitalWrite(lpower,LOW);   // Laser OFF
        IOX.write(0x000, OUTPORT);  // Lamp & Fan OFF
      }
    }

    if (xSteps  == 0 && y1Steps == 0) { // switch OFF motors on end vector
      //digitalWrite(xypower, LOW);     // Machine loses registration on power-down!
      if (laserTime == 0) IOX.write(0x0000, OUTPORT); // Turn lamp & fan OFF
    }

    if (xStop == true) Serial.println("X-buffer Hit");
    if (yStop == true) Serial.println("Y-buffer Hit");
    Serial.println("OK"); // Tell VB Arduino's ready to receive next command from vb    
    stringComplete = false;
    oxDir = xDir;
    oyDir = yDir;
    ozDir = zDir;
  }
}

void serialEvent()
{
  while (Serial.available())
  {
    char inChar = Serial.read(); 
    inStr[index++] = inChar;    // add to the inStr
    inStr[index] = '\0';        // NULL terminate the array
    if (inChar == '\n')
    {                           // Flag if char is vbcrlf
      stringComplete = true;
      index = 0;
    } 
  }
}
void xyzServos() {
  if (zSteps > 0 || (xSteps + y1Steps) < 500) tMult = 2000; // Change acceleration profile to suit Z(B) 
  else tMult = 1000;            // stepper motor which stalls below 3ms
  max1 = max(xSteps, y1Steps);
  max2 = max(y1Steps, zSteps);
  totalSteps = max(max1, max2);
  mx = (double)xSteps/(double)totalSteps;
  my1 = (double)y1Steps/(double)totalSteps;
  my2 = (double)y2Steps/(double)totalSteps;
  mz = (double)zSteps/(double)totalSteps;
  x = mx;
  y1 = my1;
  y2 = my2;
  z = mz;
  laserOn = false;
  if (cut > 0 || laserTime > 0) {
    IOX.write(0x0200, OUTPORT); // Turn lamp & fan ON
    lampState = 0x0200;
 }
  if (cut == 0 && laserTime > 0) digitalWrite(lpower, HIGH);
  for (int i = 0; i < totalSteps; i++) {
    x += mx;
    if (x >= 1 && mx != 0 && xStop == false){ // X-stepper control
      x -= 1.0;
      if (xDir == true) {
        if (xStepIdx == 0) xStepIdx = 4;
        xStepIdx--;
      }
      else {
        xStepIdx++;
        if (xStepIdx > 3) xStepIdx = 0;
      }
      if(twoWire[xStepIdx] & 1<<1){
        digitalWrite(xctrl1,HIGH);
      } 
      else {
        digitalWrite(xctrl1,LOW);
      }
      if(twoWire[xStepIdx] & 1<<0){
        digitalWrite(xctrl2,HIGH);
      } 
      else {
        digitalWrite(xctrl2,LOW);
      }
    }
    y1 += my1;
    if (y1 >= 1 && my1 != 0 && yStop == false){ // Y1(South)-stepper control
      y1 -= 1.0;
      if (yDir == true) {
        if (y1StepIdx == 0) y1StepIdx = 4;
        y1StepIdx--;
      }
      else {
        y1StepIdx++;
        if (y1StepIdx > 3) y1StepIdx = 0;
      }
      if(twoWire[y1StepIdx] & 1<<1){
        digitalWrite(yctrl1,HIGH);
      } 
      else {
        digitalWrite(yctrl1,LOW);
      }
      if(twoWire[y1StepIdx] & 1<<0){
        digitalWrite(yctrl2,HIGH);
      } 
      else {
        digitalWrite(yctrl2,LOW);
      }
    }    
    y2 += my2;
    if (y2 >= 1 && my2 != 0 && yStop == false){ // Y2(North)-stepper control
      y2 -= 1.0;
      if (yDir == true) {
        if (y2StepIdx == 0) y2StepIdx = 4;
        y2StepIdx--;
      }
      else {
        y2StepIdx++;
        if (y2StepIdx > 3) y2StepIdx = 0;
      }
      if(twoWire[y2StepIdx] & 1<<1){
        digitalWrite(y2ctrl1,HIGH);
      } 
      else {
        digitalWrite(y2ctrl1,LOW);
      }
      if(twoWire[y2StepIdx] & 1<<0){
        digitalWrite(y2ctrl2,HIGH);
      } 
      else {
        digitalWrite(y2ctrl2,LOW);
      }
    }   
    z += mz;
    if (z >= 1 && mz != 0){ // Z(B)-stepper control
      z -= 1.0;
      if (zDir == true) {
        if (zStepIdx == 0) zStepIdx = 4;
        zStepIdx--;
      }
      else {
        zStepIdx++;
        if (zStepIdx > 3) zStepIdx = 0;
      }
      IOX.write(fullWaveB[zStepIdx] + lampState, OUTPORT);
    }
    if (cut > 0) { // Put lasing code here..
      bitIndex = (i - (offset1 + space)) % (space + cut + space + gap - 1); // - 1 makes interval agree with vb HorizSteps value!
      if (bitIndex == 0 && i < (totalSteps - offset2)) laserOn = true;
      if (bitIndex == cut) laserOn = false;
      if (laserOn == true) {
        digitalWrite(lpower, HIGH); //rem-out while testing
        delay(laserTime);
      }
      else {
        digitalWrite(lpower, LOW);
        delay(3); // fastest stable transit to next cutting point
      }
    }
    else {
      if (i < totalSteps / 2)
        iTimeIdx = i;
      else
        iTimeIdx = totalSteps - i;
      if (iTimeIdx > 180) iTimeIdx = 180;
      MsDelay = (tMult*(3-(sin((270+iTimeIdx)*PI/180)))) - 1000;
      if (laserTime == 0) delayMicroseconds(MsDelay);
      else delay(laserTime);
    }
    if (digitalRead(xsensors) == HIGH && i > 25) xStop = true;
    if (digitalRead(ysensors) == HIGH && i > 25) yStop = true;
    if (xStop == true || yStop == true) digitalWrite(lpower, LOW); // cut laser power if any limit reached
    if (xStop == true && yStop == true) i = totalSteps;            // bomb-out of loop if both limits reached
  }
  digitalWrite(lpower, LOW);
  //IOX.write(0x0000, OUTPORT); // Turn lamp & fan OFF
}

The most obvious problem is that in serialEvent, you have a classic buffer overflow if the input received is not as expected. You need to prevent 'index' from incrementing past the last element of the buffer.

Okay, despite this code having run for three and a half days without issue (when only connected to the USB, anyway =()...

What would be the best approach? Add a conditional to check each byte is good before adding it to the array?

aibonewt:
Okay, despite this code having run for three and a half days without issue (when only connected to the USB, anyway =()...

What would be the best approach? Add a conditional to check each byte is good before adding it to the array?

Add a check that 'index' has not reached the end of the array. If it has, you'll need to decide what to do, e.g. wait until you receive the terminating character, then ignore that data and start again by resetting index to zero.

aibonewt:
Thanks for the suggestions!

I'd suspected that there wasn't enough smoothing going on, so last night I started a long test with a 470?F cap across the 5V/GND on the Arduino, which seemed to work better (ie. it stalled after nearly 15.5hrs).

@dc42 - I have no String class in my code, it works fine without my driver board attached and it's well within my available RAM. Thanks for asking.

I understand the comments about motor transients, but I'm still a bit worried that a test I did earlier with only the driver board's 5V/GND connected to the Arduino still managed to fail after a few hours, despite the complete absence of switching loads and motor supplies. I just can't understand how that would happen, especially as there was no other power going to the board.

I will add caps to all my ICs today, and see if that helps...

Firstly those decoupling capacitors are always needed with every logic chip. Here the ULN's aren't logic chips, they are just amplifiers
in effect so lack of decoupling can't glitch them into the wrong state, but its important to have them to reduce the switching noise on
the supplies, as they are switching large currents.

Secondly we know nothing about the cabling between boards - logic signals should not be routed over long wires without taking
appropriate steps to prevent crosstalk, reflections, etc - so how is everything connected?

Another thing that might occasionally be needed in very noisy environments is adding an extra pull-up resistor on the reset pin (1k or so).

And also have you checked the supply voltages are correct when its operating? - always worth checking just in case there's an
unexpected issue there.

Okay,

I've tweaked the code so index won't overrun, and put a 10k pull-up on the reset. Still not working for more than a few hours.

const unsigned int maxIn = 50;
char inStr[maxIn];              // More than enough for longest command

void serialEvent(){
  while (Serial.available())
  {
    char inByte = Serial.read ();
    switch (inByte)
    {
    case '\n':            // end of command
      inStr [index] = 0;  // terminating null
      stringComplete = true;
      index = 0;  
      break;
    case '\r':            // discard CR
      break;
    default:
      if (index < (maxIn - 1))
        inStr [index++] = inByte;
      break;
    }
  }
}

The thing that still bugs me is that the machine only ever stops AFTER carrying out a command. It gets it from VB, parses it, sends it back for logging, carries it out completely and only THEN freezes. I would've thought that if there were a problem with the wiring it'd just stall at any time, rather than neatly between commands.

I've wasted weeks blaming the serial, and the code, the OS (XPsp3) even the USB drivers themselves. Perhaps I need to put a loop of commands into the Arduino code and run it without using the serial and see if that manages to stay running?

aibonewt:
The thing that still bugs me is that the machine only ever stops AFTER carrying out a command. It gets it from VB, parses it, sends it back for logging, carries it out completely and only THEN freezes. I would've thought that if there were a problem with the wiring it'd just stall at any time, rather than neatly between commands.

Does the PC receive the "OK" after the last command that it carries out completely?

Ah, now THIS is why I went investigating the Serial port in the first place...

No it doesn't!

The Arduino receives the instruction, parses it, bounces it back to the VB's logfile, carries out the command and then just sits there. What happens internally at this point is still a mystery as the port is locked at this point, and the USB has to be physically un/replugged to restore it. On the other thread I even went to the extent of filming the Tx/Rx lights, but this proved fruitless as the Tx will not flash anyway if the port is locked by the laptop. I quit this line of investigation when I found that the code would run continuously without any attachment to the Arduino other than the USB. Here's my other thread, just for a different angle on the problem.

http://arduino.cc/forum/index.php/topic,129286.0.html

This is an interesting bug. It smells like a memory leak or corruption crash to me. From the other thread, it appears you were using Strings - no longer, right?

To detect a possible leak, it might be worth instrumenting how much RAM is free and printing that out every half hour. There is a magic function for calculating free ram if you search the forums.

Memory corruption is more likely than memory exhaustion if you have stopped using Strings. I would Serial.print() the living devil out of the code path between finishing the command successfully and printing OK. You know it falls off the rails in there somewhere. One Serial.print per line if that's what it takes. (Or binary search, if you have a lot of 8-hour test windows…) If you can figure out which line it fails on, it might help.

-br

I've just looked at the HardwareSerial code for writing characters, and I think there may be a race condition bug in it. Here is the code from HardwareSerial.cpp (Arduino 1.02):

size_t HardwareSerial::write(uint8_t c)
{
  int i = (_tx_buffer->head + 1) % SERIAL_BUFFER_SIZE;
	
  // If the output buffer is full, there's nothing for it other than to 
  // wait for the interrupt handler to empty it a bit
  // ???: return 0 here instead?
  while (i == _tx_buffer->tail)
    ;
	
  _tx_buffer->buffer[_tx_buffer->head] = c;
  _tx_buffer->head = i;
	
  sbi(*_ucsrb, _udrie);
  // clear the TXC bit -- "can be cleared by writing a one to its bit location"
  transmitting = true;
  sbi(*_ucsra, TXC0);
  
  return 1;
}

The first problem is that the 2-byte volatile variables tx_buffer and tx_buffer->tail are written and read without disabling interrupts. However, as the sizes of the transmit buffer are less than 256 bytes, the upper byte will always be zero, so the situation of inconsistent upper/lower bytes will not arise.

The second possible problem is the instruction to clear the TXC (transmit complete) bit. I can't see what this is for and I think it is harmful. Suppose this line is executed around the time that the UART has just finished sending a character. Then the TXC interrupt may never occur, and the data in the ring buffer will never be sent. So the ring buffer will become full and Serial.print calls will block.

If I'm right, then your code is blocking while trying to write "OK" to the serial port. You might like to try lighting a LED just before making the call and turning it off immediately after it returns. That will tell you whether the program is locking up during that call or somewhere else.

Are you definitely receiving "OK" at the PC for the penultimate command that gets executed?

EDIT: after looking at the code some more, I see that it is the Data Register Empty interrupt that is being used, not the Transmit Complete Interrupt. So my analysis above is not correct. However, I still don't see the point of clearing the TXC bit, and I still think it may be worth your while using an LED to see whether the lockup is inside Serial.print.

Interesting is one word! :0

Serial.print() everywhere sounds like a good plan right now, and it allows me to procrastinate further in the knowledge that I really need to rebuild my electronics from scratch. I'm a neat, careful worker, but the stripboard has been modified a few times since the original design, and I must've made a mistake somewhere. While building the machine I learned how to make custom boards anyway, hopefully something will show up soon before I get to that stage. Next time I fancy either using custom drivers (Allegro A4988) or Darlington arrays fed from a pair of 74HC595s to keep my pin-count low and allow me half-wave control instead of the klutzy two-wire config I'm using now. I'm also tempted to use opto-isolators between the Arduino and the control board, though I'm sure that's just overkill, an no-one would ever bother?

Without the servo control board connected to the Arduino the code runs fine, which suggests it's a hardware problem.

When everything's connected it works for anything up to 15 hours, then stalls between commands, which suggests it's a software problem.

:roll_eyes:

@dc42

Your analysis of the HardwareSerial code is a little beyond my comprehension I'm afraid, but I have a pin free so I'll try your suggestion in a day or so. Thank you.

Okay, here's a game-changer...

Out of sheer desperation I hard-coded all my serial commands into the Arduino sketch itself and removed all references to the serial bus. Still had the laptop connected, but only as a power source. The commands are quite repetitive for the first job I want to do, so it just took a couple of nested loops to replicate them exactly. Started it off yesterday lunchtime with exerything connected as normal, and a few minutes ago it finished, parked-up and powered-down.

Wow

So, can we assume it is the serial after all? I might try another run now with just Serial.begin() in the startup, to see if that works too.

Note: Just tried to upload new code, and the Arduino IDE halted with Serial Port 'COM7' already in use. Could this be a clue?

aibonewt:
Out of sheer desperation I hard-coded all my serial commands into the Arduino sketch itself and removed all references to the serial bus. Still had the laptop connected, but only as a power source. The commands are quite repetitive for the first job I want to do, so it just took a couple of nested loops to replicate them exactly. Started it off yesterday lunchtime with exerything connected as normal, and a few minutes ago it finished, parked-up and powered-down.

Try re-adding the printing of "OK" to the serial port after each command. That will tell you whether is it the serial receive or transmit that is causing the problem (or it could be an interaction between the two).

Puzzled here. Doesn't this data support another hypothesis: Maybe the arduino is working perfectly, and the bug is in the windows usb serial driver somehow, or the connection, going dead?

Didn't you say the symptom was that it finished the job just fine but the "OK" never gets to the PC?

I'm sure I'm missing something...

-br