[SOLVED] USB Serial communication fails after repeated queries

UPDATE:
Turns out the problem is with the usbser.sys driver used by most Arduinos. The solution for me is to use an Ardunio with an FTDI chip. Alternatively, if this isn't an option, I have used a hack to automatically reset the USB connection. See this http://forum.arduino.cc/index.php?topic=238351.msg1730079#msg1730079 for my instructions.

Original:
I cannot determine why I lose the ability to receive serial data from my Arduino UNO after some random amount of serial requests. This sounds similar to the unresolved questions here: Serial monitor stops working - Project Guidance - Arduino Forum and here: Serial.write stops working if not used? - Programming Questions - Arduino Forum

Here's what I do know.

  • The error is dependent on free memory, i.e. the less free SRAM I have, the more likely it is to happen. See code below for how I'm checking free SRAM.
  • I thought I was crashing the stack, but restarting the computer (and not resetting or unplugging the Arduino/USB cable) corrects the problem, i.e. a PC USB reset seems to fix the issue [the int variable "machineState" counter doesn't reset].
  • Disabling the UART buffer in Windows 8.1 has little or no effect.
  • I cannot recreate the problem by continually polling the Arduino, e.g. I had > 200,000 requests last night with no problem, it only occurs when used in combination with other hardware, i.e. variable delay.
  • Previous point led me to believe that the Serial interrupt was getting overflown or something to that effect. Thus I added the Serial.flush() to reduce this. This causes the error to be less frequent.
  • When it fails, I can receive part of the response, e.g. I send a "S" command and should receive a string number like ":1700\r". Instead I may get ":17", with no return char.
  • I'm using the SoftwareSerial library to communicate with an Serial LCD screen. I have used this to get updates on the free memory, but my current code doesn't appear to have memory leaks and I have > 1000 bytes free when things crash.
  • I can still send serial commands even when it stops transmitting responses, e.g. I can turn the LED on and off

I have tried rewriting my code to remove any Strings as suggested by the other links. To recreate the problem, I send a status request ("S") from Matlab using the Serial command and check for a numeric response. I do this repeatedly to ensure the hardware I'm communicating with hasn't been power cycled. It works fine until it doesn't. Only solution is to remove power and USB cable from Arduino OR restart the computer.

Because the problem goes away with a computer restart, I think there's an issue with the UART driver on the machine. I'm using the latest signed Arduino driver for Windows 8.

Do you have any suggestions on what I can try next or what could be going on?

The following code is representative of my program (~ 1000 lines total). Additional details of my program are that I use the tone library to play sounds and that I hack the PWM timers to speed them up per this link. http://arduino-info.wikispaces.com/Arduino-PWM-Frequency. I don't see why any of these would cause the issue I'm experiencing.

#include <avr/pgmspace.h>
#include <SoftwareSerial.h>

// PIN CONSTANTS
#define LED1Pin 3
#define txPin 13

// pins for the LEDs:
const int led1 = LED1Pin;

int machineState = 0;
char inputString[16] = {'\0'};         // a string to hold incoming data
boolean stringComplete = false;  // whether the string is complete
int inputStringLen = 0;       //char of input String

int ledAmp;          // stored 8-bit val from dutyCycleVal

SoftwareSerial LCD = SoftwareSerial(0, txPin);

void setup() {
  // initialize serial:
  Serial.begin(9600);
  
  strcpy(inputString,'\0');
  inputStringLen = 0;
  ledAmp = 255;
  analogWrite(led1, 0);
  pinMode(led1, OUTPUT);
}

void loop() {
  // print the string when a newline arrives:
  if (stringComplete) {
     doSomething();
    // clear the string:
    strcpy(inputString,'\0');
    inputStringLen = 0;
    stringComplete = false;
  }    
}

int availableMemory() {
  extern int __heap_start, *__brkval; 
  int v; 
  return (int) &v - (__brkval == 0 ? (int) &__heap_start : (int) __brkval); 
}

void doSomething() {
  int Error = 0; //set to 0, and set if appropriate function is NOT found
  // should respond with ":" ... to indicate OK, otherwise... 
  if (inputString == "L1") { // LED 1
    ledSet();
    ledResponse();   
  } else if (inputString == "S") { // STATUS OF MACHINE, count to one after first check
    Serial.print(F(":")); 
    Serial.println(machineState);  
    machineState = 1;  
  } else {
    Error = 1;
  }
  if (Error > 0) {
    Serial.println("!Invalid");
  }
}

void ledSet() {
  analogWrite(led1, ledAmp); 
}

void ledResponse() {
  Serial.print(F(":"));
  Serial.println(ledAmp);
}  

void serialEvent() {
  while (Serial.available()) {
    // get the new byte:
    char inChar = (char)Serial.read(); 
    if (inChar == '\n') {  
      // -> IGNORE
    } else if (inChar == '\r') {  
      stringComplete = true;
    } else {
      // add it to the inputString:
      inputString[inputStringLen] = inChar;
      inputStringLen = inputStringLen + 1;
      inputString[inputStringLen] = '\0';      
    }
  }
}

I notice that you are using serialEvent(). I only came across this recently and I admit I haven't tried it. That's because I have no idea how it is supposed to work as there is no documentation. If it is repeatedly triggering interrupts that may be your problem. I suggest dropping it in favour of a simple if (Serial.available > X).

Serial.flush() just waits until the Arduino output buffer empties so it may accidentally be creating a useful delay. Otherwise I doubt if it has any value.

...R

Robin2:
I notice that you are using serialEvent(). I only came across this recently and I admit I haven't tried it. That's because I have no idea how it is supposed to work as there is no documentation. If it is repeatedly triggering interrupts that may be your problem. I suggest dropping it in favour of a simple if (Serial.available > X).

Thanks for the suggestion. I thought serialEvent() was the standard way to do this, per the tutorial on Arduino's site

I did try this by moving the code to a separate function that is called repeatedly from loop(). It was a good suggestion, but it didn't change the behavior. In fact, it seemed to fail quicker (< 165 serial requests) this time.

If it was my code I would not go back to serialEvent() and I would consider the faster failure rate a step towards finding the problem. Please post your revised program.

Before I saw the serialEvent() I was going to comment on this

I cannot recreate the problem by continually polling the Arduino, e.g. I had > 200,000 requests last night with no problem, it only occurs when used in combination with other hardware, i.e. variable delay.
Previous point led me to believe that the Serial interrupt was getting overflown or something to that effect. Thus I added the Serial.flush() to reduce this. This causes the error to be less frequent.

My next line of enquiry would be to try to isolate the problem by sequentially disabling some of the hardware that you mention. I am assuming (in line with my earlier comment) that the delay caused by serial.flush() helps - but is not the solution.

What is the absolutely simplest sketch you can make that exhibits your problem behaviour? Something, perhaps, that I could compile on my Arduino.

...R

OK, thank you for the help. Here's a minimum set of code to reproduce the problem.

There's really not that much here, so there must be something else going on, unless I'm missing something painfully obvious.

I still think the most valuable clue to the problem is the fact that I can restart the computer (without removing power or disconnecting the USB cable) to recover. Some other points that might be relevant that I forgot to mention earlier.

  • USB connection is going through a USB 3.0 Hub to a USB 3.0 port (the only USB port on the Microsoft Surface Pro 2)
  • There are two (custom) shields being include, 1 does 5V - 5A power regulation for the other hardware, the other is an LED driver.
// PIN CONTANSTS
#define LED1Pin 3

char const verInfo[] = "NGA-LED-INT:1";
// pins for the LEDs:
const int led1 = LED1Pin;
char inputString[16] = {'\0'};         // a string to hold incoming data
boolean stringComplete = false;  // whether the string is complete
int inputStringLen = 0;       //char of input String
int machineState = 0;
int ledAmp;       
int led1Amp = 0;

void setup() {
  // initialize serial:
  Serial.begin(9600);

  strcpy(inputString,'\0');
  inputStringLen = 0;
  ledAmp = 255;
  analogWrite(led1, 0);
  pinMode(led1, OUTPUT);
}

void loop() {
  serialEvent222();
  if (stringComplete) {
    doSomething();
    // clear the string:
    strcpy(inputString,'\0');
    inputStringLen = 0;
    stringComplete = false;
  }    
}

void doSomething() {
  int Error = 0; //set to 0, and set if appropriate function is NOT found
  // should respond with ":" ... to indicate OK, otherwise... 
  if (strcmp(inputString,"L1") == 0) { // LED 1
    led1Amp = 255;
    analogWrite(led1, led1Amp);
    Serial.print(F(":"));
    Serial.println(led1Amp);
  } else if (strcmp(inputString,"O") == 0) { // TURN OFF LEDS
    int tmpAmp = 0;
    led1Amp = tmpAmp;
    analogWrite(led1, led1Amp);
    Serial.print(F(":")); 
    Serial.println(tmpAmp);
  } else if (strcmp(inputString,"S") == 0) { // STATUS OF MACHINE, count to one after first check
    Serial.print(F(":")); 
    Serial.println(machineState);  
    machineState = machineState + 1;  
  } else if (strcmp(inputString,"VER") == 0)  { // Respond to same Ver command as MMC-100
    Serial.print(F(":")); 
    Serial.println(verInfo); 
  } else {
    Error = 1;
  }
 
  if (Error > 0) {
    Serial.println("!Invalid");
  }
  Serial.flush();
}

void serialEvent222() {
  
  while (Serial.available()) {
    
    // get the new byte:
    char inChar = (char)Serial.read(); 

    if (inChar == '\n') {  

    } else if (inChar == '\r') {  
      stringComplete = true;
    } else {
      // add it to the inputString:
      inputString[inputStringLen] = inChar;
      inputStringLen = inputStringLen + 1;
      inputString[inputStringLen] = '\0';      
    }
  }
}

OK, I have that. But I don't know what's supposed to happen on the PC side so I can try it out.

...R

On the PC side, I have a matlab application that controls three pieces of hardware: Arduino, Stages, Camera. The software sequentially accesses the Arduino, Stages, and then Camera.

On the Arduino side, I issue the "VER" and "O" command on startup. Then in the Arduino/Stages/Camera loop, I send the "S" command and check that the response is not zero.

Well in my quest to figure out what's wrong, I managed to damage the stages. So I won't be able to test this for a week or so. I think the latest version of the code I posted can't be causing any kind of stack crash or something else.

My current thought is that the common ground must be glitching at some point under the varying load. Similar to this thread: Serial communication stops after long periods. - Project Guidance - Arduino Forum. A fridge was the supposed culprit in that thread.

Again, when I get the stages back, I will monitor the load of the 12 V supply to ground and the Arduino 5V regulated load (it's not loading anything but the Arduino board).

My main reason to post to the thread was regarding my bolded text in the first post. The fact the Arduino continues to work fine and that the USB responds once the computer has been restarted. This makes no sense to me and I'm looking for an expert opinion on what I can do to check this.

To quote sbright33,

A power surge from the refrigerator could effect the USB chip without resetting the 328?

A think another thing to consider is the USB driver for 64-bit machines.
Vivio said

The same thing happens to me: "TX" LED no longer lights up, but this happens in less than a minute if I send through com port almost 40 characters at 100ms.

Probably it has something with x64 OS, mine is win 8.1 x64. After updating the FTDI drivers I was able to get around 2-minutes of consistent output.

On a Win Xp machine, with different code, though I haven't got issues

.

But I'm using the UNO, which doesn't use the FTDI drivers. But the symptoms imply a similar problem to me...

My main reason to post to the thread was regarding my bolded text in the first post. The fact the Arduino continues to work fine and that the USB responds once the computer has been restarted.

I would suspect that the computer reboot causes the arduino to reset due to the serial port reset.

My main reason to post to the thread was regarding my bolded text in the first post. The fact the Arduino continues to work fine and that the USB responds once the computer has been restarted.

•I thought I was crashing the stack, but restarting the computer (and not resetting or unplugging the Arduino/USB cable) corrects the problem

•USB connection is going through a USB 3.0 Hub to a USB 3.0 port (the only USB port on the Microsoft Surface Pro 2)

This sounds very, very similar to a serial problem I was having on a totally different system ... using FTDI's FTD2XX.dll with LabVIEW and Windows 7. Actually, its uncanny how similar the problems were. The solution was to remove the USB Hub and use direct cable connections to the PC. All timing / buffer / periodic freezing problems then disappeared. It took more than a year later for their driver to be updated so that peripherals connected through a Hub worked reliably.

zoomkat:
I would suspect that the computer reboot causes the arduino to reset due to the serial port reset.

Good point. As you point out, this would be the normal behavior.

I forgot to mention that I've cut the reset trace. I don't want the board to reset in between port/open closing (or restarts) on the PC. Wasn't a problem until now as I short the points when I need to reprogram, which was much much less frequent until I began investigating this issue.

I spent a week trying to debug by looking through the code. I had assumed that the ATMEL was crashing (Stack overload) when I ran into the serial problem. But that's why I posted for help, because my status counter WASN'T resetting when I restarted the computer. I know, WTF?

The good news is my code is much much cleaner now.

dlloyd:
This sounds very, very similar to a serial problem I was having on a totally different system ... using FTDI's FTD2XX.dll with LabVIEW and Windows 7. Actually, its uncanny how similar the problems were. The solution was to remove the USB Hub and use direct cable connections to the PC. All timing / buffer / periodic freezing problems then disappeared. It took more than a year later for their driver to be updated so that peripherals connected through a Hub worked reliably.

Great suggestion. When I get my hardware back, I will try this.

A part of me hopes this isn't the case as the box I built only has one USB connection... And the computer only has the one USB port. But I definitely want to get to the bottom of this.

dlinear:
On the PC side, I have a matlab application that controls three pieces of hardware: Arduino, Stages, Camera. The software sequentially accesses the Arduino, Stages, and then Camera.

On the Arduino side, I issue the "VER" and "O" command on startup. Then in the Arduino/Stages/Camera loop, I send the "S" command and check that the response is not zero.

I don't have enough information here to enable me to simulate your system. I have no intention of using Matlab but I could knock up a PC program that would send typical data at the appropriate rate.

...R

Only solution is to remove power and USB cable from Arduino OR restart the computer.

You missed a test. When the problem occurs and, without doing anything else, you click the reset button on the Arduino does communications resume?

I can see I keep leaving a lot of critical details. Yes, I did try this early on and it doesn't solve the problem, only resetting the USB does (restart the computer or unplug and replug the USB cable).

dlinear:
I can see I keep leaving a lot of critical details.

Uh huh.

When the problem occurs and, without doing anything else, you shutdown and restart MATLAB does communications resume?

Once the problem occurs, Matlab, Python, RealTerm, basically any terminal program, cannot receive data back. I can continue to send data, which I can verify by LEDs turning on/off, LCD strings, etc.

What is physically connected to pins 0 and 1?

I have not connected anything. The default is for these pins to connect to ATmega8U2 for USB-Serial communication.

More possibilities...

There could be problems if a Hub is not powered with its external adapter.
The ports would limit max current to around 100 mA per port without the adapter, but will increase max current per port to 500 mA with adapter plugged in.

http://forum.arduino.cc/index.php?topic=160912.msg1206326#msg1206326
http://forum.arduino.cc/index.php?topic=179985.msg1341610#msg1341610
http://forum.arduino.cc/index.php?topic=194287.msg1435029#msg1435029

Another thing to try is to lower or turn off the FIFO settings in Windows to "correct connection problems".