Arduino Nano ESP32 hangs, possibly HTTPClient related?

michaelwillems · October 20, 2024, 1:52pm

I am using an Arduino nano ESP32 to interrogate Environment Canada's XML feed and act on that result. All is good, expect every now and then - roughly once every day or two - the system reboots due to the watchdog.

I suspect the HTTP client (#include <HTTPClient.h>) might be to blame, because it is the only part of my code that seems likely to be vulnerable to timeouts, and the only part of my code I have not used before. I use millis() instead of delays everywhere, and there is plenty of memory (214kByte).

The HTTPClient part of the code is this:

// Get all the data we need from EC Canada (XML payload) and use functions
// to parse it to get the values we need:
void getWebData() {
 esp_task_wdt_reset();  // Reset WDT before HTTP request
 HTTPClient http;
 http.setConnectTimeout(2000); // Set http connect timeout to 2 seconds   
 http.setTimeout(5000);        // Set http timeout to 5 seconds
 http.begin(fullUrl);          // Start connection using the previously created URL
 int httpResponseCode = http.GET();
 webhttpResponseCode = httpResponseCode;
 if (httpResponseCode == 200) {
   digitalWrite(redled, LOW);
   String payload = http.getString();
   // Call function to parse temperature from the XML data:
   parseTemperature(payload);
   parseHumidity(payload);
   parseWind(payload);
   parseGust(payload);
   parseBearing(payload);
   dewPoint = calculateDewPoint(ECtemp, EChumidity);
   esp_task_wdt_reset();  // Reset WDT
 } else {
   // Leave old data because it may still be OK, but raise the red LED:
   //Serial.print(F("Error on HTTP request: "));
   //Serial.println(httpResponseCode);
   digitalWrite(redled, HIGH);
 }
 http.end();
}

The main difference between this and other code I have written before, that all behaves well, is the above code. Plus the fact that the unit BOTH gets XML data, and also communicates with the Arduino Cloud. Other than that, the one onboard sensor; connecting to WiFi, parsing stuff, etc, is all code that I have used many times and that ought to work.

So my question: is it possible that HTTPClient somehow sometimes gets into a condition where it hangs forever, or at least for over 15 seconds, the Watchdog timeout? In spite of the setTimeOuts? Or should I look elsewhere?

Any ideas welcome!

Michael

PS happy to post the entire code but it's almost 1,000 lines

noiasca · October 21, 2024, 5:38am

You don't need to post 1000 lines, but a MRE [Minimal reproducible example - Wikipedia] with your function and precise links and versions to any external library if needed.

Your MRE should be complete and compileable. It also helps you to identify if your function is your problem (or something else).

michaelwillems · October 21, 2024, 1:54pm

OK I’ll have a look at that.
(The full code is posted via a link on my YouTube channel: would that help?)

kenb4 · October 21, 2024, 9:15pm

First, confirming that you did call enableLoopWDT() for the main setup+loop task? (It may depend on the board for whether/how it is on by default.) It was a lot less than 15 seconds for me.

I whipped up a deliberately slow HTTP server in Go

package main

import (
	"log"
	"net/http"
	"strconv"
	"sync/atomic"
	"time"
)

type SlowServer struct {
	reqCount atomic.Int32  // requires Go 1.19 or later
}

func (s *SlowServer) ServeHTTP(w http.ResponseWriter, req *http.Request) {
	counter := s.reqCount.Add(1)
	log.Printf("%d: %s", counter, req.URL.RawQuery)
	init, err := strconv.Atoi(req.URL.Query().Get("init"))
	if err != nil {
		init = 0
	}
	part, err := strconv.Atoi(req.URL.Query().Get("part"))
	if err != nil {
		part = 5
	}
	wait, err := strconv.Atoi(req.URL.Query().Get("wait"))
	if err != nil {
		wait = 1
	}
	time.Sleep(time.Duration(init) * time.Second)
	w.WriteHeader(429)
	for part > 0 {
		if f, ok := w.(http.Flusher); ok {
			f.Flush()
		}
		time.Sleep(time.Duration(wait) * time.Second)
		log.Printf("%d: part %d\n", counter, part)
		w.Write([]byte(strconv.Itoa(part)))
		w.Write([]byte{'\n'})
		part--
	}
}

func main() {
	log.Fatal(http.ListenAndServe(":8080", &SlowServer{}))
}

Put that in e.g. slowhttp.go and then

$ go run slowhttp.go

Test it with curl (in another shell)

$ curl -i 0:8080

Then tested with this sketch

#include <WiFi.h>
#include <HTTPClient.h>

#include "arduino_secrets.h"

HTTPClient http;

void setup() {
  // enableLoopWDT();
  Serial.begin(115200);
  WiFi.begin(SECRET_SSID, SECRET_PASS);
  for (int i = 0; WiFi.status() != WL_CONNECTED; i++) {
    Serial.print(i % 10 ? "." : "\n.");
    delay(100);
  }
  Serial.println();
  Serial.println(WiFi.localIP());

  // http.setConnectTimeout(9876);  // slowhttp.go does not change connect-time
  http.setTimeout(6543);
  // server's local IP    -- here --
  if (!http.begin("http://10.0.0.231:8080/?init=1&part=9&wait=4")) {
    Serial.println("! begin");
    return;
  }

  auto start = millis();
  int status = http.GET();
  Serial.println();
  Serial.println(millis() - start);
  Serial.println(status);
  while (http.connected()) {
    Serial.println("waiting");
    Serial.println(http.getString());
    Serial.println(millis() - start);
  }
  Serial.println("done");
}

void loop() {}

Without enableLoopWDT It finished; total time was 44 seconds.

1055
429
waiting
9
8
7
6
5
4
3
2
1

37101
waiting

44102
done

As long as the wait interval is smaller than the timeout, getString accumulated the response and returned a single String

With the WDT enabled, it was triggered. If you're worried about long HTTP requests, you could create a separate task without the WDT and communicate via a queue. Or more hacky: call disableLoopWDT to disable it temporarily.

michaelwillems · October 21, 2024, 9:36pm

enableLoopWDT() is enabled by default on the nano ESP32: the issue is that it resets too often! And the reboots are indeed due to the watchdog timer (reason = 6).

So my question is: should the HTTPClient timeouts I set not stop it from ever taking longer than seven (2+5) seconds? Am I forgetting something?

(I don't mind if it times out every now and then: I have twelve attempts in an hour and only one has to succeed.)

kenb4 · October 21, 2024, 10:17pm

In my tests, the timeout is reset every time there is another byte. (The timeout is set on the socket itself.) So if it's 5 seconds, and you get a byte every 4 seconds, HTTPClient will continue to read indefinitely -- which can trigger the WDT.

michaelwillems · October 21, 2024, 10:40pm

Oh… that would totally explain it. But then that’s like not having a timeout at all.

kenb4 · October 22, 2024, 6:11am

The timeout defaults to five seconds. On a slow-ish connection, with constant throughput, that's not that big of a file. It might be more surprising if it just gave up in that situation. It's more for total stalls.

If you're doing twelve an hour, do that in its own task with no WDT. It might be OK to have that task have a lower priority than the main task. Can also do all the parsing there. Track the total time that takes to see if it is really the problem. Then push a struct with the results on a queue that is checked by the main loop, or some other strategy.

michaelwillems · October 22, 2024, 2:40pm

The problem is that occasionally, the thing stops completely - that's where the watchdog comes in.

I suspect it's the HTTPClient part that hangs, e.g. if there's an issue at the web site. See, if it takes time I don't matter, but if it hangs forever, it's bad and the watchdog is needed.

So I thought that setting these:

http.setConnectTimeout(2000); // Set http connect timeout to 2 seconds http.setTimeout(5000); // Set http timeout to 5 seconds

...would stop it from hanging. But perhaps not. Or I am forgetting something.

kenb4 · October 22, 2024, 10:23pm

void HTTPClient::setConnectTimeout(int32_t connectTimeout)
{
    _connectTimeout = connectTimeout;
}

is only used once, when trying to connect

    if(!_client->connect(_host.c_str(), _port, _connectTimeout)) {
        log_d("failed connect to %s:%u", _host.c_str(), _port);
        return false;
    }

HTTPClient tries to follow HTTP 1.1 and reuse the connection if you do multiple calls (unless you setReuse(false)). So that will help, but the connection is likely not the problem. You might esp_task_wdt_reset after begin returns 200 to mark progress.

void HTTPClient::setTimeout(uint16_t timeout)
{
    _tcpTimeout = timeout;
    if(connected()) {
        _client->setTimeout((timeout + 500) / 1000);
    }
}

(The underlying WiFiClient has its own timeout in whole seconds and is uint32_t -- a uselessly long time.) The other timeout is used initially to read the response-line and headers

            if((millis() - lastDataTime) > _tcpTimeout) {
                return HTTPC_ERROR_READ_TIMEOUT;
            }

which returns -11 instead 200 or whatever if it's been too long since the last bunch of bytes -- not total time

    while(connected()) {
        size_t len = _client->available();
        if(len > 0) {
            String headerLine = _client->readStringUntil('\n');
            headerLine.trim(); // remove \r

            lastDataTime = millis();

The only other usage offers a clue though (with the same _client->setTimeout as before)

    // set Timeout for WiFiClient and for Stream::readBytesUntil() and Stream::readStringUntil()
    _client->setTimeout((_tcpTimeout + 500) / 1000);

readStringUntil is basically the same as readString

String Stream::readString()
{
    String ret;
    int c = timedRead();
    while(c >= 0) {
        ret += (char) c;
        c = timedRead();
    }
    return ret;
}

String Stream::readStringUntil(char terminator)
{
    String ret;
    int c = timedRead();
    while(c >= 0 && c != terminator) {
        ret += (char) c;
        c = timedRead();
    }
    return ret;
}

They both use timedRead

// private method to read stream with timeout
int Stream::timedRead()
{
    int c;
    _startMillis = millis();
    do {
        c = read();
        if(c >= 0) {
            return c;
        }
    } while(millis() - _startMillis < _timeout);
    return -1;     // -1 indicates timeout
}

And this is another usage of timeout per byte. When that happens, Stream::getString just stops and returns what it has so far, with no indication of failure. That's with httpClient.getStream().getString(). In comparison, httpClient.getString()

String HTTPClient::getString(void)
{
    // _size can be -1 when Server sends no Content-Length header
    if(_size > 0 || _size == -1) {
        StreamString sstring;
        // try to reserve needed memory (noop if _size == -1)
        if(sstring.reserve((_size + 1))) {
            writeToStream(&sstring);
            return sstring;
        } else {
            log_d("not enough memory to reserve a string! need: %d", (_size + 1));
        }
    }

    return "";
}

also returns a String with no indication of error. It calls writeToStream, ignoring the return value

/**
 * write all  message body / payload to Stream
 * @param stream Stream *
 * @return bytes written ( negative values are error codes )
 */
int HTTPClient::writeToStream(Stream * stream)

One of those errors is

            if(chunkHeader.length() <= 0) {
                return returnError(HTTPC_ERROR_READ_TIMEOUT);
            }
// ...
            // read trailing \r\n at the end of the chunk
            char buf[2];
            auto trailing_seq_len = _client->readBytes((uint8_t*)buf, 2);
            if (trailing_seq_len != 2 || buf[0] != '\r' || buf[1] != '\n') {
                return returnError(HTTPC_ERROR_READ_TIMEOUT);
            }

which occur only with "chunked" encoding.

So no, doesn't look like you're missing anything. Reviewing all this code though, looks like there is a simple-enough workaround; just need the right place to override.

#include <StreamString.h>

class WatchedStreamString : public StreamString {
  size_t write(const uint8_t *buffer, size_t size) override {
    feedLoopWDT();
    Serial.print("writing ");  // or log_d
    Serial.println(size);
    return StreamString::write(buffer, size);
  }
};

Add that subclass to the sketch, then the usage is instead

  http.setTimeout(2500);  // lower than WDT, so end-of-stream with no Content-Length is handled
  // server's local IP    -- here --
  if (!http.begin("http://10.0.0.231:8080/?init=1&part=9&wait=2")) {
    Serial.println("! begin");
    return;
  }
  feedLoopWDT();  // making progress!
  auto start = millis();
  int status = http.GET();
  feedLoopWDT();  // more progress!
  Serial.println();
  Serial.println(millis() - start);
  Serial.println(status);
  WatchedStreamString wss;
  auto size = http.getSize();
  if (size > 0) {
    if (wss.reserve(size)) {
      Serial.print("reserved ");
      Serial.println(size);
    } else {
      Serial.println("uh oh");
    }
  }
  // http.getString();
  http.writeToStream(&wss);  // progress with each block... not enough if it's a trickle
  Serial.println(millis() - start);
  Serial.println(wss);
  Serial.println("done");

Note that setTimeout must be less than the WDT time. They both default to five seconds. If the response payload returns Content-Length, uses Transfer-Encoding: chunked, or honors Connection: close, then the HTTPClient can accurately detect the last byte. Otherwise it will wait for more data before giving up; and you don't want to trigger the WDT right at the very end when you're done.

system · April 20, 2025, 10:24pm

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Esp32_nano watchdog timer Nano ESP32	2	152	November 9, 2024
Ethernet shiled , restarting every 13 minutes Networking, Protocols, and Devices	1	450	May 6, 2021
Looks like a critical bug in Arduino Nano 33 IOT Networking, Protocols, and Devices	11	1235	October 6, 2022
Serial Communication between Arduino Nano and ESP32 using ArduinoJson Programming	9	2392	July 7, 2023
Arduino randomly hangs during HTTP Request! Networking, Protocols, and Devices	45	14037	May 6, 2021

Arduino Nano ESP32 hangs, possibly HTTPClient related?

Related topics