Everything freezes when Ethernet cable unplugged

I am trying to use the P1AM-100 Arduino, a commercial board that uses mature PLC-type I/O modules, in an industrial control environment. This system has an ARM-based CPU, and looks like a SAMD21. It has 256KB of flash, and 32KB of RAM. (Not an overabundance of either, but it should be sufficient.) The system also has a P1AM-ETH Ethernet module, which is W5500-based. (I believe.) The I/O ports are all driven through SPI, including (of course) Ethernet. There are no direct-connect I/O pins except for one switch and one LED on the CPU card. (This might be a factor in the problems I'm seeing.) This is essentially required, in an environment using 480V 600A three-phase power. (Maximally noisy, in other words.) The I/O modules all have very good input signal conditioning, and are proven. (Not new for Arduino, in other words.)

Up to this point, so far, so good. I've written a number of test programs that prove that we can talk to the I/O ports, and through Modbus-TCP to other PLC's for remote I/O, which we will be using in the final product. I am using FreeRTOS in order to be able to use traditional (pre-Arduino) software isolation techniques. (Threads, basically, in order to avoid the wretched all-in-one-loop Arduino native bias.) The RTOS, at least, seems to behave well. Threads are completely isolated from each other, in that any one thread has exclusive access to a given I/O channel. (All Ethernet on one thread, all I/O to the relay board on another. They must share the underlying SPI, though. RTOS queues are used to communicate to these I/O threads. Adding an RTOS mutex for SPI in these I/O threads made no difference, so I suspect that the SPI layer is probably not at fault.)

The problem occurs when I have a multithreaded test program that is independently flipping bits up and down on both the local and the remote I/O pins. If I unplug the Ethernet cable, all activity on the processor comes to a screeching halt. This is absolutely 100% unacceptable, death to the entire project really, because the local I/O tasks must continue operating even in the face of (we hope, transient) network problems. Mechanical damage, or even injury, might occur if the control program ever gets blocked at an unexpected point. It must continue running, no exceptions. Ever.

I even went so far as to eliminate all the 'local' (through SPI) I/O and have it only blink the one direct-connect LED, with no difference in behavior. (Thus eliminating all potentially-conflicting SPI activity.) Unplugging the Ethernet cable is as dramatic as unplugging the power, so far as continued operation is concerned.

If I plug the Ethernet cable back in, everything resumes normal operation after a delay. Somewhere in the driver software things are getting stuck, and in such a way as the RTOS is unable to do its thing. I get that Ethernet is off the table with the cable unplugged, but nothing else should even burp, much less stick permanently.

I have been looking at the drivers, trying to find unexpected delay() calls, while loops, etc. Anything that can get it stuck in an un-dispatchable state, but so far no luck. I found a few delay calls that I replaced with the equivalent RTOS-friendly calls, with no improvement. Being sure you're even looking at the right source code is not easy in Arduino. Debugging is not easy in Arduino.

Is this sort of thing ringing any bells? Does anyone use Arduino for anything but toybox-level software? I am dismayed at how hostile the Arduino environment seems to be to software techniques that have been in common use since at least the 70's when I began doing this stuff. I believe the hardware to be more than capable of doing the jobs we need, it's the 'ease-of-use' Arduino platform that seems to be the problem. Rolling my own bare-metal program (sans Arduino) for this hardware, as we would have done in the 70's, is probably not going to fly; the fallback position would probably be to revert to a wretched ladder-logic PLC controller, which means it wouldn't be me doing it. Not Attractive.

Welcome to the forum

I think the following page describes nicely what Arduino is for.

I do not believe you will find much help for your issue here. Most of us are here for fun, to help and educate beginners. It sounds like with your experience you could help others with their toy projects.

Regarding your issue, it sounds like you need some professional debugging tools and libraries with support. The SAMD21 has a Micro Trace Buffer and breakpoints. So, with a good debugger you should be able to find where your code gets stuck in no time.

Have you tried running the simple Ethernet examples such as the Webserver (https://www.arduino.cc/en/Tutorial/WebServer) on the P1AM with P1AM-ETH just to see if the same problem can be reproduced with no bespoke software present?
If the problem exisits with just the webserver, then it would point to the implmentation of the P1AM-ETH rather than any software issue at your side. Have you got multiple sets of hardware by any chance that you can test one to confirm the problem occurs on other hardware? You make no mention of whether you have tested the ethernet seperately to evernthing else though you have obviously done alot of testing which is why I ask.

I have had problems on various machines over the years which appeared as software issues, but turned out to be hardware that would not show a fault with test programs. Ethernet on a workstation ws one where it would pass all the tests but under live conditions would just cease the whole system requiring a hard reboot - only traced as we had 2 of the machines and we could show the problem only happened on one. Another case was database corruption with no apparent hardware faults, but eventually the hardware to the fibreoptic link to the storage system was changed as several others had gone wrong leading to corruption, this was changed and the problem vanished.

This platform was not chosen by me, but up until now it looked like a very good choice. Proven industrial I/O hardware but usable by regular programmers, rather than designed for non-programmers like the usual PLC. I was excited to get back into low-level programming, on a platform that didn't require gigabytes of who-knows-what code running and startup times measured in minutes. The fast turn of the development crank is invigorating, as at my last embedded job software could probably be built and tested twice per day. (C++ builds took hours. And repaired bugs could take weeks to propagate through the system, so you got daily bug reports on crap you'd even forgotten how you'd already fixed. A nasty mix of tool providers who did NOT eat their own dog food, and politics by the ignorant but powerful. Good riddance, I hope they die in the market.)

Anyway, Arduino looks pretty nice, for the most part. The only problem I see is that the over-emphasis on the toybox architecture has made any packaged library software highly suspect for any serious use. Anything I write is as good (or as bad :slight_smile: ) as necessary, anything I didn't write could potentially be borderline useless, for my purposes. Nobody, myself included, wants to have to write every line of code used in a project. The hard part of software has always been error-handling, making sure it works right even in the face of the unexpected. Arduino has, if you'll pardon the expression, somewhat deliberately painted over all the fire escape windows. Not a problem, until it is.

Getting back to my problem, the Ethernet hardware appears to be working perfectly well. Running a single-threaded webserver demo won't prove anything, because there is nothing else going on in parallel. The apparent inability to do this is the problem. The current test program does almost nothing except talk Modbus through Ethernet, only blinking the single LED as a second task. That has proved out normal Ethernet functionality to my satisfaction.

There is a second hardware system, but someone else is using it at the moment. This problem does not seem like the kind of thing that's a hardware problem, unless it's systemic, in which case they'd all act the same.

I like the sound of the SAMD21 micro-trace buffer system, but I'm a bit reluctant to pursue a different development/debug environment, as using a new flow would cause a substantial delay in getting back to even where I currently am. As I understand it, now is when the decision to commit to the Arduino or not needs to be made. I'm just really hoping that somebody has BTDT and knows why this problem might be occurring, and maybe what to do about it.

Upper management picked this Arduino, but for certainty's sake against our hard deadline the project lead would prefer something else, something considerably more expensive. UM will listen to arguments against UM's choices, but woe be unto him whose ducks are not sufficiently lined up. I am acting as junior duck-wrangler here. I'd like to use the Arduino, it looks like big fun, but it must work properly in the end. If we can't be sure of that, it's out.

I obviousy have not be clear enough in why I was suggesting using the Webserver example for a test. The main test I was assuming is to see what happens if you unplug the ethernet when the webserver (or other simple ethernet based example) example is running. I wouldn't neccessarily call unpluging and replugging ethernet as normal functionality - but it cerainly should not cause the whole processor to lock up as you describe. If it does hang up (with the simple example) then that will give a clear indication that the driver most likely has a problem, if not then interaction between the driver and RTOS seems most likely.

Are you aware that you can set the Arduino IDE so that it tells you all the files it is using during a build? I ask as I only discovered this today whilst reading about something else. If you have multiple copies of library (and possibly don't even know) you could find out that the one you are looking at is not the one used for the build.

Have you tried with optimisation turned off when building? The driver may assume it is never interupted and having never been tested in a multithreaded environment as you have maybe doing something like disabling interupts whilst waiting for one itself.

I know there are multiple ethernet libraries available for the W5500, which one, version and source is the one you have?
Is there more than one accessible to your IDE?

I know some of these questions will seem like obvious things you've already done, but sometimes we can become blind to checking the obvious because we dismiss things as too unlikely, difficult to check, cannot see why it would make a dfference etc.

Been in the situation where builds took overnight and the idea of splitting things into shared images/DLL/ shared libraries was viewed by senior management as too risky for the product, The development team did it anyway for their use and meant they could change a low level routine and test it themselves without having to wait for the overnight build. Too often senior management involved people with no idea of the software engineering process and therefore too scared to allow any change to how things were done.

jcathey:
There are no direct-connect I/O pins except for one switch and one LED on the CPU card. (This might be a factor in the problems I'm seeing.)

Assuming we are talking about the same product, why not use the P1AM-GPIO plugin module to get access to the SAMD GPIO pins?
https://www.automationdirect.com/adc/shopping/catalog/programmable_controllers/open_source_controllers_(arduino-compatible)/productivityopen_(arduino-compatible)/controllers_-a-_shields/p1am-gpio

Alternatively you could pull the CPU board out of its case and access the MKR style connectors directly as shown in this video.

mikb55:
...why not use the P1AM-GPIO plugin module to get access to the SAMD GPIO pins? ...
Alternatively you could pull the CPU board out of its case and access the MKR style connectors directly...

In fact, I have already done so in order to measure some I/O timing delays using an oscilloscope to monitor pin 6, the easiest one to grab onto once the plastic CPU shell is removed. (The GPIO board is on order, but is not here yet.) The GPIO board is of zero use in production, which is what I am worried about. For us it's only for monitoring debug signals during development. (The system's 250HP electric motor makes for a pretty noisy environment, bare(-ish) GPIO pins will NOT fly!) If hammering the bugs out of the existing driver is what turns out to be necessary, this board will probably be a necessary tool, just for visibility. (Instrumenting various points in the drivers).

countrypaul:
I obviousy have not been clear enough in why I was suggesting using the Webserver example for a test. The main test I was assuming is to see what happens if you unplug the ethernet when the webserver (or other simple ethernet based example) example is running...

But this example would have to be modified in order to actually do something in parallel with Ethernet, in order to see if the problem was still there. It would no longer be trusted code. And, without RTOS tasks, such surgery might end up being somewhat invasive.

countrypaul:
I wouldn't neccessarily call unpluging and replugging ethernet as normal functionality - but it certainly should not cause the whole processor to lock up as you describe. If it does hang up (with the simple example) then that will give a clear indication that the driver most likely has a problem, if not then interaction between the driver and RTOS seems most likely.

Unplugging Ethernet is a normal fault condition. Someone could trip over a cable, the Ethernet switch (or its power supply) could fail, someone could unplug the wrong cable by mistake, a broadcast storm could swamp the switch fabric, etc. Shit happens. If the entire system freezes, it could get somebody killed. There is considerable defense in depth being applied to this project, and layer one is that the control system not go comatose for any reason.

Ethernet is working normally with the RTOS, so it's not overtly incompatible. But there's something seriously wrong somewhere when the link goes down.

countrypaul:
Are you aware that you can set the Arduino IDE so that it tells you all the files it is using during a build? I ask as I only discovered this today whilst reading about something else. If you have multiple copies of library (and possibly don't even know) you could find out that the one you are looking at is not the one used for the build.

Have you tried with optimisation turned off when building? The driver may assume it is never interrupted and having never been tested in a multithreaded environment as you have maybe doing something like disabling interrupts whilst waiting for one itself.

These are good things to know, and will be of use if I continue with this platform in future. However, the decision has already been made to abandon the Arduino as being too immature for use in this project at this time. We simply don't have time or staff to debug the basic tools (like Ethernet communication) that we're expecting to be able to just use, on top of all the stuff we know we have to write.

The P1AM-100 is an interesting product however it looks like it is designed as a bridge to allow someone to transplant an already working Arduino project into something that would be mechanically and electrically acceptable in an industrial environment.
This promotional video happily points out the major differences between the Arduino and PLC development process PLC vs Industrial Open-Source Controller (Arduino-Compatible): What to Know for the PLC Guru - YouTube

That's a pretty good general overview. Yes, with the industrial Arduino you have to provide more code, but you have better control over what happens, and when. Writing your algorithms in C rather than ladder code means you can use higher levels of abstraction (if you want to) to clarify your solution. You can write simulation harnesses and provide unit tests. Also, you can use lint-pickers, code coverage tools, revision control systems, etc. The current negative is that the Arduino has less robust and well-tested support routines for I/O, especially (at the moment) Ethernet.

PLCs remind me of early BASIC systems. They give you a head start for simpler projects, and make certain kinds of debugging easier, but get in your way mightily at the high end of things. PLCs were designed, from the very beginning, to make NON-programmers comfortable.

jcathey:
These are good things to know, and will be of use if I continue with this platform in future. However, the decision has already been made to abandon the Arduino as being too immature for use in this project at this time. We simply don't have time or staff to debug the basic tools (like Ethernet communication) that we're expecting to be able to just use, on top of all the stuff we know we have to write.

I can fully understand that position for all you know Ethernet could just be the first of many problems that you may encounter if you continue with Arduino. The debugging facility on Arduino is almost none existant and having to write everything out to a serial port can not only slow the development down but also hide problems as the code gets changed everytime another serial write is changed. Many of the libraries work fine but are Good News code and as you pointed out early on this thread the hardest part is handling the bad news.
Good luck on you new platform.