troubleshooting a complete yun lockup

Problem:
My yun seems to completely lock up after a random period of time that can be measured in hours. I mean really lock up. No network (can’t ping it), no internal logging on the linino side, no anything. Just full stop.

Setup:
I have 5 sensors on the yun. The arduino side reads their values and puts them on the bridge. The linino side grabs the values and sends them to xively via python scripts. If it can’t get to xively it logs them localy in sqlite on an sd card.

What I have done to troubleshoot:
Cronjob to output logread, mpstat, and df to sd every minute. Did not see anything abnormal like disk full, cpu pegged, out of memory, etc…
Log python script output to sd. Again nothing unusual.
Removed the xively libraries and wrote the sending myself just using requests.
Disabled wifi per: http://forum.arduino.cc/index.php?PHPSESSID=k27olhiht35m42c54uevu74vn0&topic=188821.msg1399080#msg1399080
Add the “-u” per: http://forum.arduino.cc/index.php?topic=196091.0
reset everything to out of box.
downloaded a nightly arduino build and recompiled my sketch with that.
reflashed with stock linino image from arduino website.
reflashed with latest linino image from linino website. Did not spend long on that as some other things did not work out of the box. However, if that is the route I should go I am fine with figuring that out.

I am kind of at a loss as what to do next. I am a windows server and network guy by day so I get the concepts but am new to linux so I just don’t know where else to look.

Any guidance would be greatly appreciated!

Thanks!
-M

Hi, Did I post this and forget?

I mean we are using a yun with external sensors and we have exactly the same problem. In our case it does seem to be network related, but it crashes anywhere between 2 hours and a week.

Please lets keep each other updated!

Certainly!

This was not a scientific test but it seemed to only start once I started sending to xively. I have been querying a web site for json data for a couple months via python and never seemed to have this happen. I was not paying as close attention then though.

Just thought of this but a slight difference is that my json query script runs once every 5 min. My send to xively runs an infinite loop. However it needs to. One of my sensors is a sound sensor and I need to be checking it as fast as possible for quick spikes (dog barking for example).

Also thought about setting up some sort of heartbeat between the arduino side and the linino over bridge and then have the arduino reset the linino after some period of downtime. However, if the linino is completely hung I'm not even sure how I would reset it.

It works great...until it doesn't. Exciting and frustrating at the same time :).

Maybe as a starting point you could confirm that the AVR part is still running, for example by blinking an LED at regular intervals. Once you know that something's alive, you can use that to pursue the problem.

Good advice and done. Well I did it remotely so I will see if it is actually blinking when I get home.

Thanks PeterH. The results were not what I was expecting.

It ran for about 15 hours before it dropped off the network.

Here is the odd thing. I set the led to blink at 1 second intervals. When it locked up the led was blinking at about a 28 second interval.

I need to give this some more thought but am open to any advice.

Thanks -M

update:

My project has two python scripts that run simultaneously and access the bridge. As a troubleshooting step I stopped one of them.

It has not locked up since. My uptime is currently 2 days 22:10. Obviously not a long term test but far longer then it ever ran before I made that change.

Some news about this problem? I have the same problem with an application that send a data/put command to bridge every 7 seconds. After 1hour and 20minuts the speed of system go down. I have a delay from sending of command and arduino reaction 6-8 seconds greater than normal situation....... If the command arrive every 10 seconds, the time of works become about 3 hours. Some indications? Thank you and best regards

I had a similar problem and have been troubleshooting for about 6 months. My setup is different but Arduino would lockup within a few minutes. My application was a data logger on a go cart (electric power) where i was measuring volts, amps, motor temp, speed. Then recording that to an SD card and sending to a 2004 display.

I’ve had several issues and it took connecting one sensor at a time until I found the problem(s). Still have issues with my updated OLED display though.

Not sure this helps, but perhaps strip the system down and add back part by part unit lock up happens.