[SOLVED] USB Serial communication fails after repeated queries

dlloyd:
Perhaps the latest version of the IDE has had some changes (fixes) made to the USB code (your current version hasn't been mentioned).

I was originally using version 1.04, but since last week, I've been using 1.0.5r2.

dlinear:
I understand better now. Typical sequence is

Thanks. I will try to look at this later today.

...R

I have started looking a this more carefully (I haven't tested anything yet).

A first thought ...

You have "Serial.flush()" at the end of the function "doSomething()". Serial.flush() is a blocking function that waits until the outgoing Arduino buffer is empty. Perhaps you thought it empties the incoming buffer - a reasonable mistake as it is very poorly named.

Later ...

I have now had it working with the PC sending "S Cr Lf" and the Arduino reponding with ":nnnnn" for over 30,000 iterations with no sign of a problem. It was running at nearly 20 iterations per second.

...R

Robin2:
You have "Serial.flush()" at the end of the function "doSomething()". Serial.flush() is a blocking function that waits until the outgoing Arduino buffer is empty. Perhaps you thought it empties the incoming buffer - a reasonable mistake as it is very poorly named.

Adding Serial.flush() was one of the last things I tried before posting on the forum. I didn't realize that Serial.printwas non-blocking and had assumed that everything was sequential. My thought was that it was possible that an interrupt could have been the source of the error, and since I want everything to run sequentially, this seemed like a good way to ensure my program runs as expected. Basically, I would expect the Serial.flush() to eliminate this problem as opposed to making it worse.

As a sidenote, the reason .flush() is misunderstood is because it's function has changed over time. I guess the original Arduino software has it flush the incoming buffer, but it's been changed (but the name stayed the same) to flush the outgoing buffer.

Robin2:
I have now had it working with the PC sending "S Cr Lf" and the Arduino reponding with ":nnnnn" for over 30,000 iterations with no sign of a problem. It was running at nearly 20 iterations per second.

I also don't run into the problem unless I have additional hardware being accessed.

dlinear:
I also don't run into the problem unless I have additional hardware being accessed.

You have mentioned this before nut I find it confusing. I had assumed you meant additional hardware connected to the Arduino but since you say that the earlier simple sketch will demonstrate the problem, perhaps you mean other hardware attached to the PC.

If so, that's just a teeny bit beyond the scope of this Forum.

A couple of thoughts come to mind.

The arrangement you have for sending data to and from the Arduino is not very robust. It has no means to know if only a part of a message is received, or even which part. If there is an unexpected delay anywhere else in the USB system it might get out of step.

I don't know if it has changed with USB3 (though I guess the Arduino is still on 2) but the USB1 and 2 systems have a significant 1msec delay between messages. It means that it is a very inefficient way to send data in very small packets. If your combined PC and Arduino system is repeatedly sending small packets at very frequent intervals it may have an impact on the overall USB throughput when combined with the other hardware.

...R

•The error is dependent on free memory, i.e. the less free SRAM I have, the more likely it is to happen. See code below for how I'm checking free SRAM.

•When it fails, I can receive part of the response, e.g. I send a "S" command and should receive a string number like ":1700\r". Instead I may get ":17", with no return char.

Take a look at reply#3 here.

Robin2:
If so, that's just a teeny bit beyond the scope of this Forum.

I disagree. Good advice has been offered, it's just not a software problem...

Robin2:
The arrangement you have for sending data to and from the Arduino is not very robust. It has no means to know if only a part of a message is received, or even which part. If there is an unexpected delay anywhere else in the USB system it might get out of step.

I disagree. Both sides wait for a carriage return as the signal that the message has been received. A delay will not cause a problem here, only the lack of the transmission of the carriage return at the end.

Robin2:
I don't know if it has changed with USB3 (though I guess the Arduino is still on 2) but the USB1 and 2 systems have a significant 1msec delay between messages. It means that it is a very inefficient way to send data in very small packets. If your combined PC and Arduino system is repeatedly sending small packets at very frequent intervals it may have an impact on the overall USB throughput when combined with the other hardware.

Delay is not a problem. The other hardware takes a much longer time to communicate, so a 1 ms delay is perfectly acceptable.

dlloyd:
Take a look at reply#3 here.

I think you're referring to the fact that you shouldn't use Strings. I have mentioned that I have removed this from my source code and thus, unless it's being used in a library, I'm no longer using strings. Please see the source code in reply #4 from this thread.

I don't understand why this is happening to a small number of users.... But it's certainly reproducible by these users.

dlinear:

Robin2:
If so, that's just a teeny bit beyond the scope of this Forum.

I disagree. Good advice has been offered, it's just not a software problem...

My comment was only relevant in the case where the problem is caused by hardware connected to your PC.
You haven't clarified whether this is the case.
You haven't said what other hardware might be connected to the Arduino.
You have said that the Arduino sketch that you posted is sufficient to demonstrate the problem.

Robin2:
The arrangement you have for sending data to and from the Arduino is not very robust. It has no means to know if only a part of a message is received, or even which part. If there is an unexpected delay anywhere else in the USB system it might get out of step.

I disagree. Both sides wait for a carriage return as the signal that the message has been received. A delay will not cause a problem here, only the lack of the transmission of the carriage return at the end.

What happens if they don't get the start correctly?
What happens if the CR doesn't happen? perhaps because of data corruption
With unpredictable problems data corruption must be a serious suspect

Robin2:
I don't know if it has changed with USB3 (though I guess the Arduino is still on 2) but the USB1 and 2 systems have a significant 1msec delay between messages. It means that it is a very inefficient way to send data in very small packets. If your combined PC and Arduino system is repeatedly sending small packets at very frequent intervals it may have an impact on the overall USB throughput when combined with the other hardware.

Delay is not a problem. The other hardware takes a much longer time to communicate, so a 1 ms delay is perfectly acceptable.

Sorry of I wasn't clear. I wasn't suggesting that the 1ms delay was a problem for you. I was suggesting that your code might be a problem for the 1ms delay - if you are sending small packets too often. You haven't said how often your Arduino/PC comms repeats.

...R

I think you're referring to the fact that you shouldn't use Strings. I have mentioned that I have removed this from my source code and thus, unless it's being used in a library, I'm no longer using strings. Please see the source code in reply #4 from this thread.

Yeah, just random suggestions that seem to fit. Don't worry, I've noted the details and seen the source code.

I would take any suggestions just as that - suggestions. I really wouldn't form too many solid opinions at this point - let your hardware/software configuration and test results do that for you.

If time is of the essence, you could just make a list of all possible things to test. If there are 10 things to try and 7 are simple, I would do all the simple things at once just leaving 3 to spend more time on.

Another way is to keep stripping things down (commenting out sections of code, minimizing hardware, etc) until the problem goes away ... and therein lies the problem (the last thing that was eliminated).

Anyhow, been there, done that (I absolutely love troubleshooting - but finding the answer is even better).

Robin2:
My comment was only relevant in the case where the problem is caused by hardware connected to your PC.
You haven't clarified whether this is the case.
You haven't said what other hardware might be connected to the Arduino.
You have said that the Arduino sketch that you posted is sufficient to demonstrate the problem.

Because I have been unable to reproduce the problem without the hardware, I can only assume the hardware is playing a role here. Unfortunately, the problem only occurs with other hardware connected to my PC and when I access all the hardware from my PC application. For example, the 200,000 overnight test was with all the same hardware and software, but the software was only testing serial communication with the Arduino...

I have previously posted what's connected to the Arduino, two custom shields, one providing 5V - 5A regulation and the other is an LED driver. Likely some combination of the hardware is the problem as you stated, I can reproduce the problem with the simple program I've posted.. As stated previously in this thread, the USB hub could be part of the problem...

Robin2:
What happens if they don't get the start correctly?
What happens if the CR doesn't happen? perhaps because of data corruption
With unpredictable problems data corruption must be a serious suspect

I doubt this. What will happen is that the PC will error out if the data supplied back is incorrect. If a CR is missed, a timeout will occur on the PC and the next command will error out, but communication will continue.

Robin2:
Sorry of I wasn't clear. I wasn't suggesting that the 1ms delay was a problem for you. I was suggesting that your code might be a problem for the 1ms delay - if you are sending small packets too often. You haven't said how often your Arduino/PC comms repeats.

I understand now and agree. This is not the case for me. A 10 ms software pause is part of the PC program so this won't happen.

dlloyd:
Yeah, just random suggestions that seem to fit. Don't worry, I've noted the details and seen the source code.

I would take any suggestions just as that - suggestions. I really wouldn't form too many solid opinions at this point - let your hardware/software configuration and test results do that for you.

If time is of the essence, you could just make a list of all possible things to test. If there are 10 things to try and 7 are simple, I would do all the simple things at once just leaving 3 to spend more time on.

Another way is to keep stripping things down (commenting out sections of code, minimizing hardware, etc) until the problem goes away ... and therein lies the problem (the last thing that was eliminated).

Anyhow, been there, done that (I absolutely love troubleshooting - but finding the answer is even better).

These are good suggestions, but I'm out of time.

At this point my plan is to test the hardware side. Unfortunately it's very difficult as the bug is difficult to reproduce. Regardless, when I get my hardware back tomorrow. I will

  1. Monitor the power supply and ground lines. This will be difficult to track as the bug occurs randomly...
  2. Once the bug occurs, check the lines on the arduino board to see if the serial transmit is sending data to the USB ATMEL and see if the data is actually being transmitted beyond the ATMEL.
  3. I bought a Surface docking station yesterday, so I can plug in the Arduino without using the USB 3.0 hub.

Because I have been unable to reproduce the problem without the hardware, I can only assume the hardware is playing a role here. Unfortunately, the problem only occurs with other hardware connected to my PC and when I access all the hardware from my PC application. For example, the 200,000 overnight test was with all the same hardware and software, but the software was only testing serial communication with the Arduino...

I have previously posted what's connected to the Arduino, two custom shields, one providing 5V - 5A regulation and the other is an LED driver. Likely some combination of the hardware is the problem as you stated, I can reproduce the problem with the simple program I've posted.. As stated previously in this thread, the USB hub could be part of the problem...

I can't help feeling there may be some fuzzy thinking going on if the above represents what you believe to be the entire picture.

The phrase "but the software was only testing serial communication with the Arduino" suggests to me there may have been other factors involved when the failures occurred.

And the phrase "I can reproduce the problem with the simple program I've posted" seems to be a very clear indication that the problem is not within the Arduino. And it seems to make the comment about what's connected to the Arduino irrelevant.

...R

I think we may have a winner! I hooked up the hardware (arrived this morning) and ran a test. After 30 minutes the board crashed.

I took the other shields off and then probed all the serial points. Pin 1 & 2 (TX & RX) show nothing. I then (very carefully) probed the USB SEND/RECEIVE data points before the ATMEL 16U2. Nothing... I restarted the PC and probed these same points and data is clearly being transmitted.

This means that the failure is occurring upstream (USB hub or PC). I plugged in the Arduino into it's own USB port and not through the USB 3.0 hub and now I'm running an extended test to see if it happens again. Fingers crossed.

I wouldn't say that the problem is solved if it ends up being the USB hub, but at least the issue has been located.

EDIT: Well it crashed again, this time plugged into the back of the Surface Pro Docking station USB 2.0 port. The bad news is that also appears to be a hub and the computer doesn't have any more USB ports. I'm going to see if I still see the problem when I remove the custom 5V shield I made...

Why not plug the Arduino directly into the PC and plug the other stuff into the hubs?

...R

Robin2:
Why not plug the Arduino directly into the PC and plug the other stuff into the hubs?

Computer only has ONE usb port. The docking station adds more, but brings the one USB 3.0 port into the docking station. Check out this video for my info https://www.youtube.com/watch?v=cpkAQmoyOGI

Power looks fine, plus I tried hooking up power externally (so that it's not coming from the 5V custom shield). I also used the Surface docking station (hub) instead of the industrial USB 3.0 hub.

I think the problem is with the driver on the PC. I'm attaching waveforms I collected from the Arduino (pins 1 and 2) showing the data being sent/received and I can see signals on the USB lines (differential and occurring much much faster). But I still can't receive data (can send).

I can use the "devcon" command from Microsoft to reset the Arduino USB port from the command line, and this resets the serial communication. I have to waive the white flag and use this hack as a solution, but this is hardly solved.

The only other thing to try is to use a 32-bit machine (XP) to confirm. Unfortunately, I don't have one readily available or anymore time to spend on this. A single-post user claims this fixed his problem Serial communication stops after long periods. - #12 by system - Project Guidance - Arduino Forum

DS0001.BMP (16.1 KB)

dlinear:
Computer only has ONE usb port.

Get a real computer and get rid of the problems ???

...R

dlinear:
I think the problem is with the driver on the PC. I'm attaching waveforms I collected from the Arduino (pins 1 and 2) showing the data being sent/received and I can see signals on the USB lines (differential and occurring much much faster). But I still can't receive data (can send).

I can use the "devcon" command from Microsoft to reset the Arduino USB port from the command line, and this resets the serial communication. I have to waive the white flag and use this hack as a solution, but this is hardly solved.

Could you detail what you're doing with the devcon command? I think I've been having this same issue and haven't found anyway to fix it. Is it a disable/enable command on the arduino.? Maybe you can just post the code that you run? Thank you for your help.

Robin2:
Get a real computer and get rid of the problems ???

Unfortunately, no. So far, the best success I've had is with a non-Intel USB 3.0 chipset. Using a PCIexpress USB 3.0 card (Fresco chipset) on a "real computer", I was able run for > 1 hour over 3 times. On the same computer (Win7 x64), the Intel USB caused the computer to hard crash....