Arduino Nano ESP32 hangs, possibly HTTPClient related?

I am using an Arduino nano ESP32 to interrogate Environment Canada's XML feed and act on that result. All is good, expect every now and then - roughly once every day or two - the system reboots due to the watchdog.

I suspect the HTTP client (#include <HTTPClient.h>) might be to blame, because it is the only part of my code that seems likely to be vulnerable to timeouts, and the only part of my code I have not used before. I use millis() instead of delays everywhere, and there is plenty of memory (214kByte).

The HTTPClient part of the code is this:

// Get all the data we need from EC Canada (XML payload) and use functions
// to parse it to get the values we need:
void getWebData() {
 esp_task_wdt_reset();  // Reset WDT before HTTP request
 HTTPClient http;
 http.setConnectTimeout(2000); // Set http connect timeout to 2 seconds   
 http.setTimeout(5000);        // Set http timeout to 5 seconds
 http.begin(fullUrl);          // Start connection using the previously created URL
 int httpResponseCode = http.GET();
 webhttpResponseCode = httpResponseCode;
 if (httpResponseCode == 200) {
   digitalWrite(redled, LOW);
   String payload = http.getString();
   // Call function to parse temperature from the XML data:
   parseTemperature(payload);
   parseHumidity(payload);
   parseWind(payload);
   parseGust(payload);
   parseBearing(payload);
   dewPoint = calculateDewPoint(ECtemp, EChumidity);
   esp_task_wdt_reset();  // Reset WDT
 } else {
   // Leave old data because it may still be OK, but raise the red LED:
   //Serial.print(F("Error on HTTP request: "));
   //Serial.println(httpResponseCode);
   digitalWrite(redled, HIGH);
 }
 http.end();
} 

The main difference between this and other code I have written before, that all behaves well, is the above code. Plus the fact that the unit BOTH gets XML data, and also communicates with the Arduino Cloud. Other than that, the one onboard sensor; connecting to WiFi, parsing stuff, etc, is all code that I have used many times and that ought to work.

So my question: is it possible that HTTPClient somehow sometimes gets into a condition where it hangs forever, or at least for over 15 seconds, the Watchdog timeout? In spite of the setTimeOuts? Or should I look elsewhere?

Any ideas welcome!

Michael

PS happy to post the entire code but it's almost 1,000 lines :slight_smile:

You don't need to post 1000 lines, but a MRE [Minimal reproducible example - Wikipedia] with your function and precise links and versions to any external library if needed.

Your MRE should be complete and compileable. It also helps you to identify if your function is your problem (or something else).

OK I’ll have a look at that.
(The full code is posted via a link on my YouTube channel: would that help?)

First, confirming that you did call enableLoopWDT() for the main setup+loop task? (It may depend on the board for whether/how it is on by default.) It was a lot less than 15 seconds for me.

I whipped up a deliberately slow HTTP server in Go

package main

import (
	"log"
	"net/http"
	"strconv"
	"sync/atomic"
	"time"
)

type SlowServer struct {
	reqCount atomic.Int32  // requires Go 1.19 or later
}

func (s *SlowServer) ServeHTTP(w http.ResponseWriter, req *http.Request) {
	counter := s.reqCount.Add(1)
	log.Printf("%d: %s", counter, req.URL.RawQuery)
	init, err := strconv.Atoi(req.URL.Query().Get("init"))
	if err != nil {
		init = 0
	}
	part, err := strconv.Atoi(req.URL.Query().Get("part"))
	if err != nil {
		part = 5
	}
	wait, err := strconv.Atoi(req.URL.Query().Get("wait"))
	if err != nil {
		wait = 1
	}
	time.Sleep(time.Duration(init) * time.Second)
	w.WriteHeader(429)
	for part > 0 {
		if f, ok := w.(http.Flusher); ok {
			f.Flush()
		}
		time.Sleep(time.Duration(wait) * time.Second)
		log.Printf("%d: part %d\n", counter, part)
		w.Write([]byte(strconv.Itoa(part)))
		w.Write([]byte{'\n'})
		part--
	}
}

func main() {
	log.Fatal(http.ListenAndServe(":8080", &SlowServer{}))
}

Put that in e.g. slowhttp.go and then

$ go run slowhttp.go

Test it with curl (in another shell)

$ curl -i 0:8080

Then tested with this sketch

#include <WiFi.h>
#include <HTTPClient.h>

#include "arduino_secrets.h"

HTTPClient http;

void setup() {
  // enableLoopWDT();
  Serial.begin(115200);
  WiFi.begin(SECRET_SSID, SECRET_PASS);
  for (int i = 0; WiFi.status() != WL_CONNECTED; i++) {
    Serial.print(i % 10 ? "." : "\n.");
    delay(100);
  }
  Serial.println();
  Serial.println(WiFi.localIP());

  // http.setConnectTimeout(9876);  // slowhttp.go does not change connect-time
  http.setTimeout(6543);
  // server's local IP    -- here --
  if (!http.begin("http://10.0.0.231:8080/?init=1&part=9&wait=4")) {
    Serial.println("! begin");
    return;
  }

  auto start = millis();
  int status = http.GET();
  Serial.println();
  Serial.println(millis() - start);
  Serial.println(status);
  while (http.connected()) {
    Serial.println("waiting");
    Serial.println(http.getString());
    Serial.println(millis() - start);
  }
  Serial.println("done");
}

void loop() {}

Without enableLoopWDT It finished; total time was 44 seconds.

1055
429
waiting
9
8
7
6
5
4
3
2
1

37101
waiting

44102
done

As long as the wait interval is smaller than the timeout, getString accumulated the response and returned a single String

With the WDT enabled, it was triggered. If you're worried about long HTTP requests, you could create a separate task without the WDT and communicate via a queue. Or more hacky: call disableLoopWDT to disable it temporarily.

1 Like

enableLoopWDT() is enabled by default on the nano ESP32: the issue is that it resets too often! And the reboots are indeed due to the watchdog timer (reason = 6).

So my question is: should the HTTPClient timeouts I set not stop it from ever taking longer than seven (2+5) seconds? Am I forgetting something?

(I don't mind if it times out every now and then: I have twelve attempts in an hour and only one has to succeed.)

In my tests, the timeout is reset every time there is another byte. (The timeout is set on the socket itself.) So if it's 5 seconds, and you get a byte every 4 seconds, HTTPClient will continue to read indefinitely -- which can trigger the WDT.

Oh… that would totally explain it. But then that’s like not having a timeout at all.

The timeout defaults to five seconds. On a slow-ish connection, with constant throughput, that's not that big of a file. It might be more surprising if it just gave up in that situation. It's more for total stalls.

If you're doing twelve an hour, do that in its own task with no WDT. It might be OK to have that task have a lower priority than the main task. Can also do all the parsing there. Track the total time that takes to see if it is really the problem. Then push a struct with the results on a queue that is checked by the main loop, or some other strategy.

The problem is that occasionally, the thing stops completely - that's where the watchdog comes in.

I suspect it's the HTTPClient part that hangs, e.g. if there's an issue at the web site. See, if it takes time I don't matter, but if it hangs forever, it's bad and the watchdog is needed.

So I thought that setting these:

http.setConnectTimeout(2000); // Set http connect timeout to 2 seconds http.setTimeout(5000); // Set http timeout to 5 seconds

...would stop it from hanging. But perhaps not. Or I am forgetting something.

void HTTPClient::setConnectTimeout(int32_t connectTimeout)
{
    _connectTimeout = connectTimeout;
}

is only used once, when trying to connect

    if(!_client->connect(_host.c_str(), _port, _connectTimeout)) {
        log_d("failed connect to %s:%u", _host.c_str(), _port);
        return false;
    }

HTTPClient tries to follow HTTP 1.1 and reuse the connection if you do multiple calls (unless you setReuse(false)). So that will help, but the connection is likely not the problem. You might esp_task_wdt_reset after begin returns 200 to mark progress.

void HTTPClient::setTimeout(uint16_t timeout)
{
    _tcpTimeout = timeout;
    if(connected()) {
        _client->setTimeout((timeout + 500) / 1000);
    }
}

(The underlying WiFiClient has its own timeout in whole seconds and is uint32_t -- a uselessly long time.) The other timeout is used initially to read the response-line and headers

            if((millis() - lastDataTime) > _tcpTimeout) {
                return HTTPC_ERROR_READ_TIMEOUT;
            }

which returns -11 instead 200 or whatever if it's been too long since the last bunch of bytes -- not total time

    while(connected()) {
        size_t len = _client->available();
        if(len > 0) {
            String headerLine = _client->readStringUntil('\n');
            headerLine.trim(); // remove \r

            lastDataTime = millis();

The only other usage offers a clue though (with the same _client->setTimeout as before)

    // set Timeout for WiFiClient and for Stream::readBytesUntil() and Stream::readStringUntil()
    _client->setTimeout((_tcpTimeout + 500) / 1000);

readStringUntil is basically the same as readString

String Stream::readString()
{
    String ret;
    int c = timedRead();
    while(c >= 0) {
        ret += (char) c;
        c = timedRead();
    }
    return ret;
}

String Stream::readStringUntil(char terminator)
{
    String ret;
    int c = timedRead();
    while(c >= 0 && c != terminator) {
        ret += (char) c;
        c = timedRead();
    }
    return ret;
}

They both use timedRead

// private method to read stream with timeout
int Stream::timedRead()
{
    int c;
    _startMillis = millis();
    do {
        c = read();
        if(c >= 0) {
            return c;
        }
    } while(millis() - _startMillis < _timeout);
    return -1;     // -1 indicates timeout
}

And this is another usage of timeout per byte. When that happens, Stream::getString just stops and returns what it has so far, with no indication of failure. That's with httpClient.getStream().getString(). In comparison, httpClient.getString()

String HTTPClient::getString(void)
{
    // _size can be -1 when Server sends no Content-Length header
    if(_size > 0 || _size == -1) {
        StreamString sstring;
        // try to reserve needed memory (noop if _size == -1)
        if(sstring.reserve((_size + 1))) {
            writeToStream(&sstring);
            return sstring;
        } else {
            log_d("not enough memory to reserve a string! need: %d", (_size + 1));
        }
    }

    return "";
}

also returns a String with no indication of error. It calls writeToStream, ignoring the return value

/**
 * write all  message body / payload to Stream
 * @param stream Stream *
 * @return bytes written ( negative values are error codes )
 */
int HTTPClient::writeToStream(Stream * stream)

One of those errors is

            if(chunkHeader.length() <= 0) {
                return returnError(HTTPC_ERROR_READ_TIMEOUT);
            }
// ...
            // read trailing \r\n at the end of the chunk
            char buf[2];
            auto trailing_seq_len = _client->readBytes((uint8_t*)buf, 2);
            if (trailing_seq_len != 2 || buf[0] != '\r' || buf[1] != '\n') {
                return returnError(HTTPC_ERROR_READ_TIMEOUT);
            }

which occur only with "chunked" encoding.

So no, doesn't look like you're missing anything. Reviewing all this code though, looks like there is a simple-enough workaround; just need the right place to override.

#include <StreamString.h>

class WatchedStreamString : public StreamString {
  size_t write(const uint8_t *buffer, size_t size) override {
    feedLoopWDT();
    Serial.print("writing ");  // or log_d
    Serial.println(size);
    return StreamString::write(buffer, size);
  }
};

Add that subclass to the sketch, then the usage is instead

  http.setTimeout(2500);  // lower than WDT, so end-of-stream with no Content-Length is handled
  // server's local IP    -- here --
  if (!http.begin("http://10.0.0.231:8080/?init=1&part=9&wait=2")) {
    Serial.println("! begin");
    return;
  }
  feedLoopWDT();  // making progress!
  auto start = millis();
  int status = http.GET();
  feedLoopWDT();  // more progress!
  Serial.println();
  Serial.println(millis() - start);
  Serial.println(status);
  WatchedStreamString wss;
  auto size = http.getSize();
  if (size > 0) {
    if (wss.reserve(size)) {
      Serial.print("reserved ");
      Serial.println(size);
    } else {
      Serial.println("uh oh");
    }
  }
  // http.getString();
  http.writeToStream(&wss);  // progress with each block... not enough if it's a trickle
  Serial.println(millis() - start);
  Serial.println(wss);
  Serial.println("done");

Note that setTimeout must be less than the WDT time. They both default to five seconds. If the response payload returns Content-Length, uses Transfer-Encoding: chunked, or honors Connection: close, then the HTTPClient can accurately detect the last byte. Otherwise it will wait for more data before giving up; and you don't want to trigger the WDT right at the very end when you're done.

1 Like

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.