ESP32, free heap and String

I'm having hard time to debug an ESP32 application that occasionally shows odd behaviors like fatal error that causes resets, often related to network stuff, i.e.:

assert failed: tcp_update_rcv_ann_wnd IDF/components/lwip/lwip/src/core/tcp.c:951 (new_rcv_ann_wnd <= 0xffff)


Backtrace: 0x40083fd1:0x3ffdd470 0x400960d1:0x3ffdd490 0x4009b9e5:0x3ffdd4b0 0x4012feee:0x3ffdd5e0 0x4012ff9c:0x3ffdd600 0x400f416b:0x3ffdd620 0x4012c890:0x3ffdd640

  #0  0x40083fd1:0x3ffdd470 in panic_abort at /Users/mark/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_system/panic.c:408
  #1  0x400960d1:0x3ffdd490 in esp_system_abort at /Users/mark/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_system/esp_system.c:137
  #2  0x4009b9e5:0x3ffdd4b0 in __assert_func at /Users/mark/Desktop/ESP32/ESP32S2/esp-idf-public/components/newlib/assert.c:85
  #3  0x4012feee:0x3ffdd5e0 in tcp_update_rcv_ann_wnd at /Users/mark/Desktop/ESP32/ESP32S2/esp-idf-public/components/lwip/lwip/src/core/tcp.c:951
      (inlined by) tcp_update_rcv_ann_wnd at /Users/mark/Desktop/ESP32/ESP32S2/esp-idf-public/components/lwip/lwip/src/core/tcp.c:931
  #4  0x4012ff9c:0x3ffdd600 in tcp_recved at /Users/mark/Desktop/ESP32/ESP32S2/esp-idf-public/components/lwip/lwip/src/core/tcp.c:991
  #5  0x400f416b:0x3ffdd620 in _tcp_recved_api(tcpip_api_call_data*) at .pio/libdeps/Aphrodite/Async TCP/src/AsyncTCP.cpp:444
  #6  0x4012c890:0x3ffdd640 in tcpip_thread_handle_msg at /Users/mark/Desktop/ESP32/ESP32S2/esp-idf-public/components/lwip/lwip/src/api/tcpip.c:172
      (inlined by) tcpip_thread at /Users/mark/Desktop/ESP32/ESP32S2/esp-idf-public/components/lwip/lwip/src/api/tcpip.c:154

It's not easy to replicate the problem, so I'm trying to understand what can cause it.
At first, I tried to inspect the RAM usage. The ESP.getFreeHeap() and ESP.getMaxAllocHeap() return:

Free heap: 76452 bytes
Max alloc: 61428 bytes

I don't think I'm running out of memory, even considering the fragmentation due to the usage of the String class. I'm using it since the ESPAsyncWebServer library uses it for the processor.

For this reason I have a lot of code that use the String class, example:

String WebApp::processor(const String &var)
{
  if (var == F("FORM_ALARM_DURATION")) return String(_alarm.UserProgram()->duration / 60);
  if (var == F("FORM_ALARM_FADE")) return String(_alarm.UserProgram()->fadeIn);
  if (var == F("FORM_ALARM_MODE")) return String(static_cast<int>(_alarm.UserProgram()->mode));
  // ...
  if (var == F("SELECT_SESSION_PROGRAMS")) return selectSessionPrograms();
  if (var == F("SELECT_ALARM_PROGRAMS")) return selectAlarmPrograms();
  if (var == F("SELECT_ALARM_MODES")) return selectAlarmModes();
  // ...
}

where each function returns a String, like:

String WebApp::selectSessionPrograms()
{
  String html = "";
  for (int i = 0; i < _session.ProgramsSize(); i++)
  {
    SessionProgram *program = _session.GetProgramAtPosition(i);
    if (i > 0 && !_session.HasBouquetAvailable(program)) continue;
    html += addSelectOption(String(i), i == 0 ? _languages.GetText("LANG_WEB_CUSTOM") : program->name);
  }  

  return html;
}

I understand it's hard to know, but as a general rule of thumb, I kindly ask:

  1. is my heap status critical?
  2. should I find a way to NOT use String even with ESPAsyncWebServer?
  3. can a class level String variable (instead of function level one) could improve the situation? I mean, creating a String html = ""; as class member, instead of having one for each function.

no, your function returns a copy of the String anyway. the local one is temporary and deleted.

you would need to do that during the intensive use like when the server handles the processor. You don't see what's happening in the background there, but as you call the processor() function, you could track how it goes at least every time it's being called.

How large is the web page you serve?

1 Like

About 14 kB, according to the "size" field in the network tab in the browser.
It may have almost 40-50 processor strings to substitute.

so it might end up much larger than the initial 14kB and puts a toll on the memory.

The error assert failed: tcp_update_rcv_ann_wnd (new_rcv_ann_wnd <= 0xffff) indicates that the TCP receive window size calculated by the system is exceeding the maximum allowable value of 65,535 bytes.

This could be due to an overflow or miscalculation in the receive window size, incorrect handling of TCP segments, or issues related to resource exhaustion such as high data volume or numerous connections.

Browsers often use asynchronous loading for resources like images, scripts, and stylesheets. This means that while the page is still being loaded, the browser can initiate requests for these additional resources and render them as they arrive. Does your web page include such elements that would trigger more HTTP request back to the ESP?

1 Like

Firstly pin your code to core 1. That will hopefully make the crashing behavior more consistent.
Secondly, see if you can make it fail faster by using more strings or bigger strings.
If that doesn't increase the crash rate then perhaps string usage is only minor contributor to the problem.
Try stressing some other part of the system by adding a bunch of of high priority tasks that allocate memory etc.. Again you are looking for changes in crashing behavior that could possibly indicate that you are operating close to some fundamental resource limit.

1 Like

Yes, I think so:

<head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>~NAME~ v~VERSION~</title>
    <link rel="icon" href="data:image/svg+xml,<svg xmlns=%22http://www.w3.org/2000/svg%22 viewBox=%220 0 100 100%22><text y=%22.9em%22 font-size=%2290%22>&#127744;</text></svg>">
    <link rel="stylesheet" href="css/chota.min.css">
    <link rel="stylesheet" href="css/solid.min.css">
    <link rel="stylesheet" href="css/index.css">
    <script type="text/javascript" src="js/jscolor.min.js"></script>
    <script type="text/javascript" src="js/index.js"></script>
</head>

and, indeed, I see the browers initiates all the requests at once.
Is there anything I can set/limit server side? I.e. using headers?

There isn't a specific HTTP header that directly tells a browser not to fetch everything at once

if the browser is connected to the internet as well as the ESP and you have somewhere where you could keep those resources, you could store them there which would limit the pressure on the ESP.

Unfortunately, @Mark81 didn't supply the complete code (please do). But, if the code uses the standard Arduino setup() / loop() structure (and doesn't spool up other FreeRTOS tasks), then the code is already running on Core 1 (in looptask).

The ESPAsyncWebServer library relies on the lwIP stack, which operates on core 0 by default. Events related to network communication, such as HTTP requests, are recorded and stored in event queues managed by the library.

It is my understanding that the task that processes these events, handling the HTTP requests and invoking the appropriate callback functions, also runs on core 0.

This is why any long or blocking code within the on() callbacks should be avoided, or at least carefully managed.

If a callback takes too long or blocks, it can interfere with the networking stack's processing, potentially causing issues.

In theory what you can do is to queue TCP connection requests until previous request is finished.

As I can understand from your post the ESP32 is a web server and it gets flooded by a number of requests from a web browser. If you handle new clients in a separate task, which accept() new connections then you can do queueing there.

Could you please post a piece of code which accepts connections from a client?

This is the problem. So in order to make your webserver to process 1 request a time you would probably need to decrease TCP window size as well.

are you familiar with the ESPAsyncWebServer library?

Yes my bad, I replied without looking into it first.
This is what I am doing now :(.
Looks like one have to modify the library in order to queue connection requests.

There is already a queue. This actually takes place in AsyncTCP, where buffering directly uses a handle for a queue within FreeRTOS ➜ _async_queue

It is an OS structure used to store messages or pointers to objects waiting to be processed by a specific task. It allows communication between different tasks or between a task and an interrupt. Here, a dedicated task (from ESPAsyncWebServer) monitors this queue, retrieves messages, and executes the appropriate callbacks to handle the event. (hence the "async"(hronous) part of the name)

The size of the queue is limited in the code when it is created

➜ therefore, 32 lwip_event_packet_t structures can be stored, representing 32 low-level LWIP events (LWIP_TCP_SENT, LWIP_TCP_RECV, LWIP_TCP_FIN, LWIP_TCP_ERROR, LWIP_TCP_POLL, LWIP_TCP_CLEAR, LWIP_TCP_ACCEPT, LWIP_TCP_CONNECTED, LWIP_TCP_DNS).

When a traditional HTTP GET request is processed, several events of this level can be generated by the lwIP network stack ➜ typically:

  • LWIP_TCP_ACCEPT when an incoming TCP connection is accepted,
    then
  • several LWIP_TCP_RECV each time data (the HTTP GET request) is received on the TCP connection,
  • and LWIP_TCP_FIN when a TCP connection is properly closed by the other side (the remote side closes the connection).

But during processing, many LWIP_TCP_POLL events can be received, which are sent periodically to check the connection status, check if there is data available to read, or if data needs to be sent.

When the queue is full, the code simply ignores new requests — you will see some

in the event processing functions.

➜ so if we haven't responded quickly enough to a request and — say — 40 LWIP events have arrived in the meantime, we will have lost 8 of them. The server dequeues and receives others, and when it reaches the end of the 32 that were pending, it will be quite troubled about what to do when dequeuing the following ones because it would have lost 8 in the meantime.

Increasing the queue size could work until it does not and in short, all this to say that callbacks need to be short; if they take too long, this needs to be handled differently.

1 Like

I figured that to be the case. My point was that the suggestion:

Is rather unnecessary / impractical. Since most ESP32 programmers in this forum likely use Arduino's setup() / loop() structure (rather than creating FreeRTOS tasks), the user code is already running on Core 1. And, moving the ESPAsyncWebServer from it's default core is likely beyond OP's ability.

OK.

My point was that the callback (which is user code) runs on core 0.

Thanks for the detailed explanations!

Some progresses on my side.
First I tried the for of AsyncTCP and ESPAsyncWebServer by mathieucarbou.

I set the following build flags:

-D CONFIG_ASYNC_TCP_MAX_ACK_TIME=3000
-D CONFIG_ASYNC_TCP_PRIORITY=10
-D CONFIG_ASYNC_TCP_QUEUE_SIZE=256
-D CONFIG_ASYNC_TCP_RUNNING_CORE=1
-D CONFIG_ASYNC_TCP_STACK_SIZE=8192

Until now I haven't seen reset. Of course the test time was limited, but with the "official" libraries it would have already happened.

On the other side, I think I still have an issue with heap and/or with multiple requests.
I put a debug printf inside the processor function to print the minimum sizes ever reached. They are:

Free heap: 29760
Max alloc: 10228

quite low compared to the starting values. And of course these are the only I can catch, since the files served without the processor are not visible from user code.

Sometimes, it stills does not load all the files required, surely for the reasons you explained above. By the way, it seems it fails more often on mobile browsers than from desktop ones, where it hardly happens now. Example of loading one of my page:

[the dropdown stuff is not from my code, it seems an extension of chrome: request URL: chrome-extension://ghmbeldphafepmbegfdlkpapadhbakde/dropdown.html ]

by the way, I don't have callbacks for HTTP GET requests (only for POST). So the time spent serving a file is either in the processor function (like the above example) or for reading from flash if it is served statically (like css, js, etc...)

For complex web pages I usually try to handle loading stuff the lazy way using AJAX where I just push a template (no processor ) and then update through JavaScript and async web request from the page. This also makes it look better as there is no full refresh of the page

1 Like