Problem with TCP client connection with ESP8266 Telnet

Hello.
The problem happens after the following:

  1. Client connect
  2. Client desconnect WiFi
  3. Client connect WiFi fast (about 10 seconds)

When this happen, the ESP8266 does not know the client had desconnect, so keep trying to send the messages, and when it runs the exactly line that it try to send to this telnet client, the program stops for almost 30 seconds (I think its the timeout of the client.print command)

So my question is, there is anyway to change this timeout of the TCP protocol using the ESP8266WiFi.h library?
Or could have some command at this library that really knows when the client disconnect?

UDP is not an option here, neither using a request metod and after send the message the client desconnect
I noticed that WDT does not activate when this happen, but if the client lost WiFi and not come back before 10 seconds the WDT activate and the ESP resets (its not a problem to me)

OBS:

  1. I know there is some redundant code (it happens when I is trying to find the right way to do something and this way is working, someday when i have more time i will resume this code and organize)
  2. There is some part of the code that I cut off because its not important to understand the problem, so probably have some global variables that is not in use

Here is all code, the problem is showed at 448 to 518, exactly at the serverClients.print part
Edit: Apparently the code can't be in the message because it overflowed the character limit (9000). So the code is attached.

Thanks in advance.

TCP_TELNET.ino (15.1 KB)

Pff, your code is hard to read like this, you really have to structure it more into functions. The server/client method you use with the array of clients is not something i ever used before, and it does not seem to be good for overview. If you deal with the wdt, there is no need to disable it (it has a purpose) just make sure there are no endless loops and make sure that long loops either include yield() or delay() (which includes yield() ) that way scheduled tasks (such as maintaining a wifi connection) are carried out on a regular basis. Remove the whole signal strength bit for now. remove the wifi.reconnect() it is meant for reconnecting to a device that connected to the ESP as an AP (which you don't do ) after the esp is connecting as an STA.

I will structure it.

Do you have other method to handle at multiple clients instead an array of clients?

The ESP.wdtDisable() method is to use hardware WDT instead software WDT as I've found here

I've never used yield() method, I'll look about it and how to implement this (at first glance, seems pretty easy)

The rssi is there because when the esp get far away to the AP slowly it wasn't understanding it has lost WiFi (I don't know why this happened), so to get over it I put a rssi limit that the esp itself disconnect. It has handle the problem

The reconnect() I'll remove.

I found another problem that I still don't find a solution, I have 6 APs with the same SSID, the ESP can handle to change to the best AP (in terms of rssi) without lost the clients connected to it? I've looking for a while and don't find a good solution, they talk about it at here: Use ESP8266WifiMulti instead of ESP8266Wifi? · Issue #3173 · arendst/Tasmota · GitHub

I have organized the code a little better

Put yield whenever I find its necessary

Removed the reconnect

About rssi, I have to use to a function too, just to see the quality of the signal at distance, so I can't remove this

Daniel_Sampaio:
I found another problem that I still don't find a solution, I have 6 APs with the same SSID, the ESP can handle to change to the best AP (in terms of rssi) without lost the clients connected to it? I've looking for a while and don't find a good solution, they talk about it at here: Use ESP8266WifiMulti instead of ESP8266Wifi? · Issue #3173 · arendst/Tasmota · GitHub

I think the name of this is handoff, can ESP8266 handle it?
At the network side, others devices like cellphones or tablets is handling it very well, so the problem is not at AP configuration (I think)

I will test the code and feedback about it ASAIC

TCP_TELNET.ino (15.4 KB)

I tried some situations about the first issue related here, and notice that this only happens if the client lost Wifi, if client close the connection with ESP without losing WiFi it works well.

I think its related due the fact the TCP conections need to know when the clients wants to disconnect and it has to be said by the client. So when the client lost WiFi, it couldn't be sent, so ESP cannot know what happens.

I find some informations here, but there is no reliable solution:
https://forum.arduino.cc/index.php?topic=535898.0

I think that in last case I could check the time since the command client.print is sent and compare to the time when the program comes back to work (about 25s ~ 30s) then disconnect the client who have this issue. But the problem about this workaround is because when this situation happen, I will lost about 30s doing nothing.

I have organized the code a little better

Oh yes it is a huge improvement !

Put yield whenever I find its necessary

i would change you loop() like this :

void loop() {
  // ESP.wdtFeed(); // there is no need for this, we just had a yield() at the end of loop 
  //and that resets the wdt as well
  wifi_connecting(); //handle if lost connection
  yield(); 
  wifi_newClient(); //check if there is any new client
  yield(); 
  wifi_anyClient(); //check if there is any client at all
  yield(); 
  blink_led(); //handle the debug LED issues
  yield(); 
  readTelnet(); //read messages from Telnet Clients and handle them
  yield(); 
  wifi_uart_telnet(); //check UART for data and send to all telnet clients
  yield(); 
  wifi_resend(); //resend stored data to all clients who doesn't answer (but still connected)

  // yield(); // at the end of loop() there is no need for yield() it is automatic.
}

The ESP.wdtDisable() method is to use hardware WDT instead software WDT as I've found here

you may as well just use the software wdt, the timeout for it is about 2.5s,

I think the name of this is handoff, can ESP8266 handle it?

i think it can, though i haven't experiment with it. Once the siganl gets to weak for connection (below -90) it would need to connect to another node, i suspect you would need to explicitly tell it to do so using wifi.begin(), which does cause a temporary 'drop-out' in the connection. i actually have no such network in place around here (i live on 45sqM)

At the network side, others devices like cellphones or tablets is handling it very well

Oh yes, well, but is not automatic, they have code making sure that happens, they automatically connect to any known network, if the network that is the strongest known network has a different name, they'll connect to that one. You might want to set the boundary for weak network a little higher (-85) as to anticipate a change over. Keep in kind that while scanning you are also disconnected.
All in all, what issues do you have now ?
There are a few pointers that sprung in to mind while going through your code.

void blink_led() {
  bool already_blinking = false;
  if (flag_hasClient)

i suspect this function is supposed to be non-blocking led blink, in that case already_blinking should be

static bool already_blinking = false;

the same would go for a few other global variables that could be static locals, but this one changes functionality.

String read_serial, data[MAX_SRV_CLIENTS], data_ant[MAX_SRV_CLIENTS];

Although on an ESP you do not have to worry about memory nearly as much as on an AVR, and the String class is definitely doing a much better job of managing the memory used, it is still a good idea to do some managing yourself if you want to use a global String, that you modify locally. (check out the .reserve() function from the String class ) What is said about memory fragmentation is true, it can cause your program to fail unexpectedly even if it has been running just fine for a really long time already (months maybe even, though in the current setup i don't think it ever really would)

 if (EEPROM.read(EEPROM_SAVE_POSITION) != EEPROM_SAVE_1) {
    DEBUG = DEBUG_DEFAULT;
    if (DEBUG) EEPROM.write(DEBUG_SAVE_POSITION, 1);
    else EEPROM.write(DEBUG_SAVE_POSITION, 0);
    EEPROM.write(EEPROM_SAVE_POSITION, EEPROM_SAVE_1);
    EEPROM.commit();
  } 
  else if (EEPROM.read(DEBUG_SAVE_POSITION) == 1) {
    DEBUG = true;
    Serial.println();
    Serial.println("Debug mode on");
  } 
  else  DEBUG = false;

though this is a neat solution, keep in mind the flash is guaranteed for only about a thousand writes, and that is not an awful lot, though i guess you would not turn the debug on & off all the time.
and uhmm, who taught you this ?

       data[i] = "";
      }

readSerial:

    while (Serial.available()) {
      c = Serial.read();
      if (c != '\n' && c != '\r')
        for (int i = 0; i < MAX_SRV_CLIENTS; i++)
          if (flag_client[i])
            data[i].concat(c);
      yield();
    }

    for (int i = 0; i < MAX_SRV_CLIENTS; i++) {
      if (flag_client[i])
        if (data[i] == "P")
          goto readSerial;
      if (rssi[i])
        hasMsg[i] = false;

There must be a more elegant way of achieving the same loop rather than using 'goto'
Personally i would make 'readSerial' a bool function, and where it has 'goto' put

yield();
return false;

and then make a line that does

while (!readSerial());

or something like that.

I think its related due the fact the TCP conections need to know when the clients wants to disconnect and it has to be said by the client. So when the client lost WiFi, it couldn't be sent, so ESP cannot know what happens.

Oh dear, the client may also lose wifi connection ? hey the client needs to monitor it own connection.

First of all, thank you for the reply and the feedback about my improvements

Deva_Rishi:
you may as well just use the software wdt, the timeout for it is about 2.5s, i think it can, though i haven't experiment with it. Once the siganl gets to weak for connection (below -90) it would need to connect to another node, i suspect you would need to explicitly tell it to do so using wifi.begin(), which does cause a temporary 'drop-out' in the connection. i actually have no such network in place around here (i live on 45sqM)Oh yes, well, but is not automatic, they have code making sure that happens, they automatically connect to any known network, if the network that is the strongest known network has a different name, they'll connect to that one. You might want to set the boundary for weak network a little higher (-85) as to anticipate a change over. Keep in kind that while scanning you are also disconnected.

I've changed a little, but It turns that software wdt was not activating and hardware wdt only feeds with ESP.wdtFeed() and this function is not called at the end of the loop function.
So I've put this at the most part it has yield

I've changed the boundary for weak network as you said to -85

I've put a wifi scan method and filter with only the ssid I wanted, then it choose the best rssi signal to connect, this scan will be executed every time it has lost connection or weak signal ( < -85 dbm)

Deva_Rishi:
There are a few pointers that sprung in to mind while going through your code.

I'm not really good at pointers, I'll need some time to understand how to implement efficiently

Deva_Rishi:
though this is a neat solution, keep in mind the flash is guaranteed for only about a thousand writes, and that is not an awful lot, though i guess you would not turn the debug on & off all the time.
and uhmm, who taught you this ?

I've worked in another project months ago that I needed to use EEPROM, so I implemented this way.

This debug thing will be change once in a while just to correct some bugs I'd found. It will not get a lot of writes (i hope)

There must be a more elegant way of achieving the same loop rather than using 'goto'

I agree but this was just a little change to correct another person problem that I hope he will solve his side in a few months (which has more things to solve), btw it just wait another letter if it receives only a "P"

All in all, what issues do you have now ?

I've found a way to avoid the code crash when the first problem related here occurs, I've found the command to change the timeout of client.print stuffs
So now the problem is, when this WiFi thing occurs, the code don't crash but ESP still don't know if the client has disconnected (keeping that spot occupied)
I'm figuring out how to handle it

Deva_Rishi:
Oh dear, the client may also lose wifi connection ? hey the client needs to monitor it own connection.

The problem is that even if clients lost connection and could come back, the ESP get messy when this happen, so the problem comes back to me

TCP_TELNET.ino (20.1 KB)

I've put a wifi scan method and filter with only the ssid I wanted, then it choose the best rssi signal to connect, this scan will be executed every time it has lost connection or weak signal ( < -85 dbm)

keep in mind that during a scan you are disconnected.

I'm not really good at pointers, I'll need some time to understand how to implement efficiently

I was not taking about actual pointers .. hehe.

I've changed a little, but It turns that software wdt was not activating and hardware wdt only feeds with ESP.wdtFeed()

would that have something to do with you disabling it ? here it also sates you have to explicitly give a timeout as an argument when re-enabling it (though it somehow doesn't seem to respond to it.. uhh well the core is work in progress i suppose..)

So now the problem is, when this WiFi thing occurs, the code don't crash but ESP still don't know if the client has disconnected (keeping that spot occupied)
I'm figuring out how to handle it

you are not the only one, Here there is a massive thread (with loads of cross-references in it) and basically it is considered a bug. If you can force a check on a client being connected by sending a request to acknowledge, you could keep the time-out quite short i suppose. I was alos wondering if you could just regularly send a notice to disconnect the TCP to the client and just let the client re-connect rather then trying to keep the connection alive.

Or even if a client loses connection it destroys it's previous client object first up after creating a new one (this would involve every device having some kind of ID)

First I tried to find some way to get MAC from client in STA mode, but don't find how it could be done.
This way I could check if the same MAC try to connect so disconnect the previous one.

So I put some code to use the external acknowledge it already have to keep the clients alive.

Thank you for all support.