Simple server-client connection help

Hello. Please bare with me, as I’m new to the forum and Arduino in general, so I hope this is in the right place and please excuse any mistakes I may make along the way. LONG STORY ! :o

TL/DR: I need help with a VERY simple client-server setup to read the state of a switch from one board and control a second board via Ethernet. The problem is the connection drops sometimes and doesn’t resume on its own - THIS is my real problem, as I managed to mull through most of this mess on my own and when it works…it works, despite being “wrong” on so many levels - not a programmer here, but I love this stuff and want to learn more by talking to actual people rather than by reading “dry” documentation. The fact that I at least tried to it myself should warrant that I get some help and prove I’m not just trying to get others to do it for me. There’s some specifics involved too, hence why I’d like “human” interaction please. :slight_smile:

LONG VERSION:
I’m working on a project involving two Nucleo144 boards (F767ZI). I’m using the required libraries, as shown at the top of my code. What I need this setup to do is read the state of a float switch (so a toggle switch) connected to one board and then control some output pins on the second board over Ethernet. The reason I don’t do it directly with one board is because the float switch is sitting in a tank which is filled by a pump that is too far away from the tank. The board at the pump end should do just that: switch this three phase pump on/off depending on the state of the float switch. Sounds simple, right ? In addition, one very important thing, I also need to work out if the two boards get cut off so the pump doesn’t keep filling the tank and cause it to overflow !
On my own, I managed to “concoct” something very rude and rudimentary (and probably very “wrong”) by putting together various pieces of code in my own little way, but it doesn’t work right and I REALLY need to get this thing going and also learn what I’m doing wrong. The code is a mess from a professional’s point of view, but that’s just the point: I never claimed I was a pro, hence why I’d like to learn to do it right. I also added my own ideas and twists here and there which are not “industry standard”, so that’s another thing I need to point out: don’t go too professional on me :smiley: It all seems to work on paper, but the problem is it’s not reliable over time. The thing needs to run constantly and often it locks up. Without further ado, let’s look at the “server side” code (attached). Please feel free to ask questions, as the whole code doesn’t fit.

I chose to make the board near the tank the “server” and the one near the pump the “client”…I think this may not be the best idea, but bare with me.

Over at the “server”, I have this: I start by including the two libraries required for the Nucleo boards:

I then commented out the mac line because I learned from here that the ethernet IC that’s on these Nucleo boards can use its own mac, so it’s not needed. Indeed it works - no problems here.

Now here’s where I took the first “liberty”: instead of attempting to use DHCP every time, regardless of whether it succeeds or not (like in all the example codes I’ve seen), I created my own bool (“doDHCP”) to act as a “checkbox” that sets whether I want to use DHCP or not. I then left the following lines intact to set what IPs should be used if doDHCP is set to false. I also initialize the ethernet server which I call “theServer”…(does this name matter ??) This part works and I DO get an address like I envisioned.

Some pins come next: these correspond to a Nucleo144 board obviously. The colors represent the color of the wires I ran to my float switch, for my own convenience. This is a SPDT switch, so when the float is up, one contact closes and pulls to GND acting as a “stop” and when the float drops, the other contact closes and represents a “run”. The doDHCP thing also comes next and I set it to true - this part works too.

Next, I ran into the problem of not being able to tell when the network cable is unplugged. This is apparently a shortcoming of the library used for the Nucleo boards and there was nothing I could do about it, so I just commented that part out. The idea would’ve been to “halt” here if a cable is not plugged (no link), because there’s no point, but I couldn’t get it to work, despite trying to seek help from the library creator…it’s a little over my head :frowning:

Next comes that doDHCP mess and then the server is also “started”. This all works.

This is how I’m reading the state of the switch: thinking it’d be “safer” for the board, I’m using a set of optocouplers between the contacts and the Nucleo, despite the switch itself not being powered by any dangerous voltages - doesn’t matter. The idea works: I’m first checking to see if the switch has been in that position before ( if (tankStatus != 1) ). At startup, tankStatus is not declared, so we enter the if statement (because it’s NOT equal to 1): I set millis as the time the switch closed. I then read the state of both the “run” AND “stop” pins to ensure they’re not stuck on at the same time (corrosion, etc). The state has to be maintained like that for the time at “debounceTime” in order to set tankStatus to 1, otherwise the while loop breaks and the code moves on to the next part, which is literally the same thing, but “mirrored” and with 0 instead of 1 for tankStatus…whew…please bare with me if you’re still reading :smiley:

There’s also that “ERR” state I mentioned where both contacts either get stuck ON at the same time or OFF at the same time and that sets tankStatus to 2.

Everything so far works and it’s just fine and dandy: my crude “debounce” mechanism ensures the state doesn’t toggle immediately if there’s like a “wave” in the water tank and the float bounces up and down several times. True, it does “block” the code for 10 seconds every time the state changes, but hey: that’s why I’m discussing it.

Now comes the fun part which drove me insane: The first line seems to be standard on every “server”: it creates a client called “theClient” from whatever client is connected to my server here. This confused me to no end at first because I couldn’t understand it: if this is a “server”, why am I suddenly creating a client ? I think I managed to understand why: the “server” class has no “read” function like the client class does, so to actually “READ” anything from a client that’s connected, I have to create a client on the server…if that makes any sense. Again, I find the info rather insufficient sometimes…
After that, I initialize a blank string called commandString.
While the clients sends out data, I halt in that loop and keep on receiving characters and adding them to commandString until it spits out the end of line character \n, at which point the loop breaks and I read my message: I set my client at the pump side to spell “POLL” when it wants to get the state of the switch, so when POLL “forms”, I call the function “doReply()” which is in the next part.
I’m not sure if I should be calling “theClient.stop()” after returning from the function…not enough info on this, but I tried it with and without it and it doesn’t seem to make a difference. I read somewhere that it’s required to call it to “flush the buffer”…which makes sense, though I don’t know IF this happens as it should or if it should be there.

EthernetClient theClient = theServer.available();         //Get available client

  String commandString;
  while (theClient.available())
  { char c = theClient.read();
    if (c != '\n') {
      commandString += c; // Read char until linefeed, then add received char to commandString variable
    }
    else {
      Serial.println("POLLED");
      if (commandString == "POLL") {
        doReply();
        theClient.stop();
      }
    }
  }

Lastly, there’s the doReply() function which I created…functions by themselves also took me a while to comprehend and couldn’t figure out what goes on, but then I learned that when called upon, the code jumps to the function, RUNS the function, then the code goes back to where the function was originally called, so in my case: my idea is that I’m calling the doReply() function where I reply to my pump client who is still waiting for the state of the switch (0, 1 or 2), then I snap back to the “main” code and do “client.Stop()”…then repeat the loop.

void doReply() {
  if (tankStatus == 1)
  {
    theServer.print("ON\n");
  }

  else if (tankStatus == 0)
  {
    theServer.print("OFF\n");
  }

  else if (tankStatus == 2)
  {
    theServer.print("ERR\n");
  }
}

The client code is a lot more messy due to the numerous pins and stuff I want it to do…pilot lamps and whatnot, so I shall do a second post for that. Thank you for your patience thus far.

TankSideServer.ino (5.92 KB)

This is the code for the “client”. There’s a lot of stuff going on: the idea, as suggested by someone on another forum is that I should be “polling” the switch at regular intervals. I tried doing this with my own knowledge. I also must absolutely have some sort of fail-safe to stop the pump if the connection gets cut off, so I thought of that too and it seems to work.
What DOESN’T work however is everything else: the client does manage to connect to the server, sends a few polls, gets the replies ON or OFF back, but then suddenly doesn’t do it anymore: it doesn’t send out anything anymore until I reset the client board and even then it’s not a guaranteed it manages to connect and continue this back and forth…I’m not sure if the problem is mainly at the server side or the client side. When the cable gets unplugged, it’s even worse: sometimes it works, other times it doesn’t…it’s WAY too unreliable ! The concept is there I think, but some of the functions are not clear to me, like client.connected()…does this evaluate to false after calling client.stop() ? I’m using the built in LED of the Nucleo as a sort of pilot lamp to follow this, that’s why you see LED_BUILTIN there.

I hope you have the patience to look over my code and offer some pointers. Thank you.

PumpSideClient.ino (8.8 KB)

How far apart are the tank sensor and pump?

I know you said "too far" to run everything on the same board, but if it were possible it would save you a lot of grief here.

In terms of code I think your main issue here is probably understanding the behaviour of the STMEthernet API. It says it is based on the Arduino Ethernet API, so it may be worth reviewing the examples there for sending UDP packets back and forth?

I had a chance to look at the arduino Ethernet library.

So far as I can make out, EthernetServer and EthernetClient are “probably” using TCP under the hood (guessing a bit since there are specific classes for UDP).

Looking at your tank-side code it seems broadly reasonable, although note:

  • As you mentioned you’re blocking execution for switch debouncing. This is generally not a great idea. There are multiple techniques you can use to avoid this – I recommend reading Nick Gammon’s primer (Gammon Forum : Electronics : Microprocessors : How to do multiple things at once ... like cook bacon and eggs) for a jumping off point. You could also use Finite State Machines (Nick references these briefly), or software timers: there are various ways to skin the cat.

  • Personally I’d probably just send single-character commands e.g. “P” instead of “POLL” – less faffing about to do in the server code.

On the client side:

  • It’s quite hard to follow the code. You have a lot of logic going on to keep track of state. I would try to remove almost all of this until you have the very basics working.

  • pollServer() contains a number of while() loops which make my spidey senses tingle. Generally looping forever on something that isn’t clearly going to change state is a concern.

  • In pollServer() I don’t think you should need to send the POLL request more than once (TCP is a reliable transport).

  • Again in pollServer(), a bool invalidResponse is declared without giving it an initial value. So it’ll be either true or false depending on what’s on the stack (caveat: this is definitely true in C, and I’m fairly sure is true in C++; I’d need to check the specification to be sure). Worse, the variable is only set (and set true, which causes the whole of pollServer to retry) if the response from the server is not recognised. So I suspect that’s at least one “freezes randomly” bug – if invalidResponse is true at the end of pollServer it’ll get stuck since the server closes the client after sending the POLL response.

If I were you I would try something like this as a starter for ten on the client:

unsigned long lastPoll = 0;
#define POLL_INTERVAL 15000
#define SERVER_WAIT_INTERVAL 500

void loop() {
    
    // This check will skip the rest of loop() unless
    // lastPoll is unset (i.e. we didn't poll yet) or
    // we last polled POLL_INTERVAL millseconds ago.
    if (lastPoll && millis() - lastPoll < POLL_INTERVAL)
        continue;
    
    // Connect to the server
    if (!theClient.connect(testServer, 23)) {
        Serial.print("connection failed");
        continue;
    }
    
    // Send poll request.  Shouldn't need to resend this
    // since TCP is reliable.
    theClient.println("POLL");
    
    // Block for the server's response.  Give it a bounded
    // amount of time to get back to us.
    String rsp;
    unsigned long start = millis();
    
    while (millis() - start < SERVER_WAIT_INTERVAL) {
        if (theClient.available()) {
            char c = theClient.read();
            if (c == '\n')
                break;
            rsp += c;
        }
    }
    
    // Close the client
    theClient.stop();
    
    // Decide what to do with the response
    if (rsp == "ON") {
    
    } else if (rsp == "OFF") {
    
    } else if (rsp == "ERR") {
    
    } else {
        Serial.print("Unexpected reponse: ");
        Serial.println(rsp);
        continue;
    }
    
    lastPoll = millis();
}

Once something dead simple along those lines works reliably, then start adding in the rest of the functionality.

Ok, let's review a few things.
First off, thank you for your reply - really helps to know at least SOMEONE had a look and offered to help :smiley:
As an update, I enlisted the help of a colleague of mine who IS a programmer and can "see" code far better than I can and he managed to get it off the ground so to say - we poked and prodded at it and found some "interesting" stuff which you happened to mention too. As we speak, the boards were left to "chat" to each other since yesterday with the mods my buddy did and I'll confirm they're still in sync tomorrow, since they're at the office. Right now, I'm sure they are just fine - it's what happens when the connection drops that concerns me, but now that seems to work as well, thanks to my buddy.

tomparkin:
How far apart are the tank sensor and pump?

I know you said "too far" to run everything on the same board, but if it were possible it would save you a lot of grief here.

In terms of code I think your main issue here is probably understanding the behaviour of the STMEthernet API. It says it is based on the Arduino Ethernet API, so it may be worth reviewing the examples there for sending UDP packets back and forth?

They ARE far - like 4km apart and that is in a straight line - without counting the hills and valleys and houses in between ! If I could, I would've done it directly in a jiffy, of course, but this was the only way, since there's a fiber GPON in each of these two locations.

You mentioned UDP - I'm not using UDP here, but I DID consider it...you seem to mention this in your next reply here anyway.

tomparkin:
So far as I can make out, EthernetServer and EthernetClient are "probably" using TCP under the hood (guessing a bit since there are specific classes for UDP).

So, yes, I'd say it's TCP.

Next: reading the tank-side float switch. When I first made that part, it was even worse: I was blocking execution each time through the loop, because I was checking the switch even if it hadn't changed states. I then added if (tankStatus != 1/0/2) to skip the check entirely if it hadn't changed, for the sake of speed. True, it means that every once in a while, when the state eventually DOES change, I WILL go into the loop and sit there for 10 seconds to "confirm" - that's true, but at the time I didn't think it would impact functionality, and I think it still doesn't. It's also probably a bit too high to be reasonable - 5s would probably work just as well.

Next, THIS is a good idea and I will apply it to both the "POLL" and the ON/OFF replies, in the form of maybe 1/2/3 - the simpler the better.

tomparkin:

  • Personally I'd probably just send single-character commands e.g. "P" instead of "POLL" -- less faffing about to do in the server code.

Next, this:

tomparkin:

  • It's quite hard to follow the code. You have a lot of logic going on to keep track of state. I would try to remove almost all of this until you have the very basics working.

Yes, I started off simple and kept expanding it more and more, trying to add more and more features which seemed to work at the time, but proved to break down over time, also turning the code into a mess. My buddy was polite enough to NOT point this out to me and just rolled with it, polishing my stuff :smiley:

Then,

tomparkin:

  • In pollServer() I don't think you should need to send the POLL request more than once (TCP is a reliable transport).

  • Again in pollServer(), a bool invalidResponse is declared without giving it an initial value. So it'll be either true or false depending on what's on the stack (caveat: this is definitely true in C, and I'm fairly sure is true in C++; I'd need to check the specification to be sure). Worse, the variable is only set (and set true, which causes the whole of pollServer to retry) if the response from the server is not recognised. So I suspect that's at least one "freezes randomly" bug -- if invalidResponse is true at the end of pollServer it'll get stuck since the server closes the client after sending the POLL response.

The idea behind sending a "poll" more than once came after I noticed that sometimes it doesn't go through, but a second one does, so I didn't want to "terminate" and re-attempt to connect if just ONE poll failed - maybe the second one got through. This was illustrated in my serial monitor which sometimes produced a garbage message, like instead of POLL, something like PO?LL? would show up (or other random characters in there). If the server received this, it wouldn't know what to do with it and wouldn't reply back. The "client" would then still wait for an answer and eventually time out and retry the connection...just a thought...
The invalidResponse boolean WAS indeed spotted by my colleague and corrected, by declaring it to false each time, instead of just letting it "float" like that. I will post the "updated" versions of the codes soon - they're on his laptop.
Thank you.

First off, thank you for your reply - really helps to know at least SOMEONE had a look and offered to help :D

No worries :slight_smile:

As an update, I enlisted the help of a colleague of mine who IS a programmer and can "see" code far better than I can and he managed to get it off the ground so to say

Fantastic! I'm glad you got it up and running.

Yes, I started off simple and kept expanding it more and more, trying to add more and more features which seemed to work at the time, but proved to break down over time, also turning the code into a mess. My buddy was polite enough to NOT point this out to me and just rolled with it

I meant no offense -- I have indeed been there and done that many times.
The reason I mentioned it really is that I've found that if it's hard to follow what's going on in the code, it's often likely that the code hides a bug or three, especially if I'm chopping and changing things. That's really why I suggested trying to reduce the complexity of the client side loop. The more obvious and clear it is, the more likely it is to be correct.

They ARE far - like 4km apart and that is in a straight line - without counting the hills and valleys and houses in between

Wow, you weren't kidding :slight_smile:
Sounds like an interesting project. I'm glad you're up and running now -- good luck with it.

We won't know for sure until we do some more stress-testing, because I thought it was functional the first time around, only to realize it would stall sooner or later. The stall seemed to occur mostly when the connection broke, as it works fine when "waltzing" back and forth the way they should. Fiber cuts and power outages DO occur, so I tried to have some sort of simple recovery procedure, which is to wait and see if "ON/OFF" messages still pour back to the client and assume the connection's dead once they stop.

I wasn't sure where I should drop my client.stop(); function on both the server and the client. By looking at other codes as examples, I figured it's best to drop it after the client has processed the ON/OFF reply (on the client side) and after the server has sent out its "ON/OFF" message (on the server side). I find it interesting there's no server.read(); function, the same way there is a client.read(); function, so I have to create a client....on the server to get access to this function, which WAS confusing at first, but now I see it's rather standard practice...

I appreciate you've got a local programmer helping you out now, and that's many times more useful than some random chap on the internet :slight_smile:

But...

We won't know for sure until we do some more stress-testing, because I thought it was functional the first time around, only to realize it would stall sooner or later. The stall seemed to occur mostly when the connection broke, as it works fine when "waltzing" back and forth the way they should. Fiber cuts and power outages DO occur, so I tried to have some sort of simple recovery procedure, which is to wait and see if "ON/OFF" messages still pour back to the client and assume the connection's dead once they stop.

Thinking about what might go wrong is always a good idea. A few observations, though.

Firstly, it's worthwhile being aware of what TCP already provides for you, and what you therefore don't need to worry about in your application code.

In order to establish a TCP connection, a client initiates a three-way handshake with the server. As such, if the client connect() method fails, it strongly implies the server isn't there (or possibly is blocking in a 10-second tank sensor check loop).

Furthermore, TCP, although a reliable transport, doesn't keep retrying forever and will eventually time out if one end of the connection goes away. It also incorporates sequence numbers to ensure packets arrive in order. The upshot of this is that it's unlikely that you'd receive some kind of backed-up flood of server responses to the client if there is an outage.

Secondly, error recovery is by its nature a rarely-traversed code path, so it's typically very easy to make mistakes there and not notice, unless you're rigorous in your testing.

So my preference would always be to try to incorporate the error recovery into the "normal behavior" algorithm as much as possible.

In the client side .ino posted previously, there is a lot of special-case handling/looping which probably mostly won't run. In the loop I posted, there isn't really any special-case handling at all: if anything unexpected happens we bail out and try again later. This is a very simple (but probably effective) approach which you can almost verify is correct just by looking at the code.

Finally, it's often difficult to predict what given real-world failures will look like from the perspective of the code.

By writing lots of error-handling up front you're (a) doing a lot of work that might need redoing if your expectations of how errors manifest turns out to be wrong, and (b) potentially obscuring the error when it does occur, which may hamper your ability to observe it and figure out how best to handle it.

There's a software-engineering principle called "YAGNI", which stands for "You Ain't Gonna Need It", and which advises against over-engineering up-front. I've found it to be a good principle to adhere to. Add complexity when it is needed, but not before.

I wasn't sure where I should drop my client.stop(); function on both the server and the client. By looking at other codes as examples, I figured it's best to drop it after the client has processed the ON/OFF reply (on the client side) and after the server has sent out its "ON/OFF" message (on the server side)

The stop method appears to trigger graceful termination of the connection using a TCP FIN packet, so the right time to call it would be after you've finished doing any I/O on the connection. So on the server side, it should be called after the server sends the response; and on the client side it should be called after the client receives the response.

tomparkin:
So my preference would always be to try to incorporate the error recovery into the "normal behavior" algorithm as much as possible.

I believe I/we did just this: every time through the loop, a "POLL" request gets sent out by the client - if there's no answer with an ON/OFF after X attempts, we "time out" and assume the connection is dead and we go into a loop where the client keeps trying to connect. This all seems to work. My buddy also shortened the messages to single characters (P for POLL, N for ON, F for OFF and E for ERR) on the client and server respectively as suggested.

VERY interesting to note is that there also seem to be some power-related issues going on ! On our workbench where we have our test setup with the two Nucleo144 boards, we noticed the client likes to lock up in certain scenarios, which are too much to be coincidences and are unrelated to code buginess, especially since I can replicate them each time, so this is another path we need to go down on. The lock-up was sometimes so bad that the client board won't even be pinged from a command prompt at that point ! It's like the ethernet stack quits or gets stuck somehow. Not even the main loop runs ! NOTHING ! It just sits there ! It's not a hardware fault - I swapped the boards around.

So: what we did to check if the client manages to connect back to the server after a connection drop, I of course cut the power to the "server" board by disconnecting it from the USB charger we were using to power it. Sure enough, the connection dropped, the "client" (which was still powered up from my buddy's laptop's USB port) went into its "comErr" condition, then I plugged the "server" back in, it did its thing, grabbed an IP and the client reconnected - no problems. I did this by unplugging the micro-usb cable from the board, but left the charger itself plugged into the power strip on the desk. However, when I attempted the same power-cut by unplugging the whole charger from the strip, this time,the connection didn't resume when the server was back up ! Instead, the "client" once again detected the drop, went into its "comErr" loop where it's supposed to try to reconnect every x seconds to the server (as it should), but as soon as I plugged the charger back into the mains and powered the "server" back up, the "client" board froze ! This time though Ethernet continued working and I could still ping the board ! Still, the main loop was frozen, as we were no longer seeing the Serial.println("FAIL"); message which was supposed to roll after each failed attempt at connecting to the server (added by my buddy as a "debugger" so we'd have SOME sort of feedback from the board) ! In other words, when if (client.connect(server, port)); is called, we get either a fail or a success and this is inside a loop, so if the connection fails, my buddy added a Serial.println("FAIL"); to the "else" part of client.connect, to let us know if the code is at least looping and trying to connect. This time, we were not getting ANYTHING in the serial monitor ! I think what happened this time is the "client" board (which WAS attempting to connected to the "dead" server as it should), was a little too close to the power strip and the spark that occurs naturally when the plug of a SMPS is inserted into a mains plug MAY have just messed it up enough to halt the code, so even though my "server" was now up and running, my "CLIENT" was not even attempting to connect to it !

The same thing happened yesterday: we had both boards plugged into the same desktop PC via USB. I'm not sure if some sort of ground-loop occurred or whether there was some software bug related to having two COM ports going on at the same time, but basically it DID - NOT - WORK - AT ALL that time: it'd only get like 1-2 replies from the "server", then the "client" would quit, to the point where not even pinging it would work ! Today, this chap brought in his own laptop to work on, so one of the boards was plugged into his laptop and the problem went away. Still, it could be perfectly replicated by plugging both boards into his laptop !

Long story short: either the client gets affected by nearby EMI very easily, or there's some other power issue going on that we can't catch....will do some more experimenting to try and isolate this. Also, I'll post the edits my pal did to the code so we're "up to date" here...

Here are the two “revisions” of the code, with some moderate changes in key areas.

SingleClientPollsSingleServerSTM.ino (9.71 KB)

ServerRespondsSTM.ino (6.18 KB)

You mentioned trying this:

tomparkin:

unsigned long lastPoll = 0;

#define POLL_INTERVAL 15000
#define SERVER_WAIT_INTERVAL 500

void loop() {
   
   // This check will skip the rest of loop() unless
   // lastPoll is unset (i.e. we didn’t poll yet) or
   // we last polled POLL_INTERVAL millseconds ago.
   if (lastPoll && millis() - lastPoll < POLL_INTERVAL)
       continue;
   
   // Connect to the server
   if (!theClient.connect(testServer, 23)) {
       Serial.print(“connection failed”);
       continue;
   }
   
   // Send poll request.  Shouldn’t need to resend this
   // since TCP is reliable.
   theClient.println(“POLL”);
   
   // Block for the server’s response.  Give it a bounded
   // amount of time to get back to us.
   String rsp;
   unsigned long start = millis();
   
   while (millis() - start < SERVER_WAIT_INTERVAL) {
       if (theClient.available()) {
           char c = theClient.read();
           if (c == ‘\n’)
               break;
           rsp += c;
       }
   }
   
   // Close the client
   theClient.stop();
   
   // Decide what to do with the response
   if (rsp == “ON”) {
   
   } else if (rsp == “OFF”) {
   
   } else if (rsp == “ERR”) {
   
   } else {
       Serial.print("Unexpected reponse: ");
       Serial.println(rsp);
       continue;
   }
   
   lastPoll = millis();
}

Problem is the code won’t compile this way - it mentions a continue statement not being within a loop…even though the main loop IS technically a loop. My buddy replaced continue with return to get it to compile…

Ah! Silly me :slight_smile:

I should have mentioned it wasn't compile-tested (I don't have the Ethernet libs you're using). I was trying to outline the approach rather than provide working code per-se.

However your friend's fix is the right one: replace all instances of "continue" with "return" to achieve the effect I was aiming for.

Since you're trying that approach, I should reinforce something I think I mentioned previously: this algorithm as written will spam the server with connection attempts if a connection fails, up until it successfully polls. Practically you probably want to have a small stand-off interval, especially if the server is blocking for multiple seconds to debouce the level sensor.

I would probably modify this block accordingly:

    // Connect to the server
    if (!theClient.connect(testServer, 23)) {
        Serial.print("connection failed");
        // don't immediately retry to avoid a storm of connection attempts
        lastPoll += 1000;
        return;
    }

This rate-limits the client to one re-connect attempt a second, which is a bit more reasonable.

We did what you suggested (rather, my buddy did) and we managed to get it working....for now. We left it to run over the weekend and come back to it on Monday. Hopefully it's still up and running at that point, but suffice to say it's WAY better now than it was, so there's a glimpse of hope the problem's solved. I'll post the latest code soon enough. Thanks.

Here is the code currently running on each board. Sorry, there’s comments everywhere. One thing I discovered is that the client doesn’t actually “freeze” as I originally thought, but rather “hang”…for a VERY long time. I left it running and it ran for around 36 hours straight, but then all of a sudden this happened. The part it seems to hang at is somewhere after line 122:

digitalWrite(LED_BUILTIN, HIGH);

The LED comes on and STAYS on. We purposely added that as a debug LED. Normally, this function gets called, so the onboard LED comes on, then because further down, in line 143, we call

digitalWrite(LED_BUILTIN, LOW);

the LED should go off immediately, provided line 124

if (!theClient.connect(testServer, 23))

is succesful and we “get through” to our server board. This idea of crude debugging works, but after running for a long time, at some point this no longer happens for some reason and the LED remains on and the whole thing hangs there. It’s at this point I can no longer ping the IP address of the board either. Strangely, if I let it sit like this for hours on end, it eventually gets out of this state, turns off my output pin (so it makes it calls the stopPump(); function) and starts doing errBeep() over and over…it’s like something overflows and its not being dealt with properly, or there’s something in the library that’s acting up.

True, we’re not calling client.stop(); anywhere on our client. Through experimenting, we found out this causes it to work even WORSE and it would lock up like this after a few minutes or even less, that’s why you see theClient.stop() commented out all over the place. We DO call it on the SERVER board though. Please have a look through all that mess and feel free to ask questions - every little bit helps. Thank you.

SingleClientPollsSingleServerSTM.ino (13.1 KB)

ServerRespondsSTM.ino (7.23 KB)

I was wondering how you were going on :slight_smile:

The server code (I think, reading between the comments...) looks better without long delay() calls for debouncing switches.

On the client side a couple of thoughts:

  • Each time around loop() you're creating a new connection if !theClient.connected(). But you only poll every pollInterval (15,000 ms). On the server side, you're sitting in a fairly tight loop checking for client connections and closing them. So far as I can make out from a quick look at the STM32Ethernet code, the server-side close will cause the client side TCP socket to close (as you'd expect/imagine to some extent). As such, you'll effectively be creating/destroying sockets AS FAST AS POSSIBLE. I can't help but feel this is probably going to hammer the Ethernet libraries somewhat! I note that the STM32Ethernet example for a periodic connection (STM32Ethernet/WebClientRepeating.ino at master · stm32duino/STM32Ethernet · GitHub) shows the client creating a new connection only when it wants to poll the server. This seems a better idea to be honest.
  • EthernetClient.stop() cleans up the client-side connection state. As I mentioned, this seems to do the same job as the client side of the TCP connection noticing that the server closed the connection, but it doesn't hurt to be explicit IMO. The network stack on the chipset will surely have a limit on connections/socket, so if you accidentally leak one due to a race condition or some weirdness, it'll cause things to break eventually. So better to clean up the client side with EthernetClient.stop().

tomparkin:
On the server side, you're sitting in a fairly tight loop checking for client connections and closing them [...] As such, you'll effectively be creating/destroying sockets AS FAST AS POSSIBLE.

If I remember the code correctly (it's actually been some time since I had a look at it, plus it was heavily modified by my buddy :D), on the server, the connection is only supposed to close ("destroy" the socket) AFTER the server has replied to the client with one of its 3 states...not sure if this IS happening indeed...
So we should try to re-include client.stop(); on the client board as well. I was thinking of dropping it when "commErr" happens (it's still there, just commented out). Even so, I feel it wouldn't get called anyway, since I fear my code halts at client.connect(server, port), so that part only gets run after like 10+ hours as previously discussed, since it DOES eventually happen, but after a laughable amount of time...
Another idea would be to flip the roles around: have the board reading the float switch be the client and the board running the relay to the pump be the server. The upshot of this (as I imagine it at least) is that there would be only ONE "message": it would go from the client to the server. Currently, there are 2 sides to this "conversation": the client "polls" the server and THEN the server responds with its switch state - so 2 messages. By migrating the client to where the switch is, this would theoretically get simplified: I'd have the client connect at regular intervals to the server, report out its switch state, then disconnect. I'd use these "reports" as "heartbeats" at the server end to determine if the connection is still there.
Now what would happen if the connection drops ? On the server side, if a "report" doesn't arrive in time, I'd call client.close(); by setting up a countdown timer or something. On the client side it would be less of a problem because I'd be calling client.stop() after each "transmission" anyway, so if the connection drops when my code is just within the "transmission" part, client.stop(); would get called anyway at its end. If it drops BEFORE the code has actually entered the "transmission" part, client.connect(server, port); would not allow it to go any further anyway and I could call client.stop(); in its "else" loop for good measure anyway....just some thoughts :slight_smile:

Side note: has anyone had any luck with using the debug feature of Arduino PRO IDE on a Nucleo board ???

I thought it would be a useful tool to have when dealing with complicated projects like this, to see where my code hangs, but it's either not designed to work with STM32 boards, or I'm doing something wrong. There's very little documentation and instructions for the debug feature anyway and attempting to use it on a board that's not natively supported further complicates matters and it feels like trying to fit a round peg in a square hole...