WiFi Network causing unpredictable stalls

I’m having a problem with the WiFi network causing unpredictable stalls. I have an application that demands the main loop execute every 10ms. So what I have is a main loop that begins with a piece of code something like if(time_since_last_run < 10ms) return; I have timers all over the place so I know my code normally takes between 6-8ms to run and executes pretty close to every 10ms most of the time.

This code opens up a WiFiServer and waits for a client to connect, when one does it starts writing a 100 byte message out the TCP port every time through the loop (so 100hz or 10k second) which goes to a telemetry receiver on a laptop.

Most of the time this works well, but sometimes something causes the main loop to stall for as long as 100-1000 ms. No clue what’s causing it, but I see the same problem on both an Adafruit M0 WiFi and an ESP32. Was surprised to see the same behavior on the ESP32 because it’s so much faster and is dual-core.

I’ve wiresharked the transmission and am seeing odd behavior there too. A packet should be sent every .01 seconds which is what it usually does. When there is a stall there might be no packets sent for .13 seconds, then a burst of packets and acks will come very quickly in considerably less than .01 seconds. It looks like there is never more than 600-800 bytes in flight before an ack comes in.

Sometimes, the wireshark shows a missing packet, but mostly it is the behavior described above where there are no errors, dropped packets or missing ACKs

According to wireshark the TCP window is 5744, which I never get close to. Also the TCP packets are not being split or combined, they always contain the same amount of data.

Does this sound like anything you have encountered before? Am I missing some TCP parameters that could be tuned to make this work better?

I can post code and wireshark captures if this would help.

Forgot to mention, I don’t expect the laptop to receive a nice smooth stream of data, I expect it to be a bit bursty and that doesn’t matter, it’s just status/telemetry.

The problem is that on the Arduino end, the network write call or something in the background is stalling the main loop for up to 1 second and the arduino main loop is controlling a system that needs 100hz updates.

Temporarily switch to UDP and see if that makes a difference.

Are you using a WiFi hub or is it a point to point connection? Wifi hubs vary in quality. The cheaper ones run hot because manufacturers use the cheapest consumer components and run them at the limit.

The chipsets in laptops have aggressive power saving modes that try to power down peripheral devices whenever possible. That may be affecting the network interface. Try a desktop PC instead.

Unfortunately, the telemetry receiver I usually use doesn't support UDP. I'll have to find/write something to connect to the ESP32 and display the packets on a PC.

Right now I am using a WiFi Access point to connect the ESP32 to the laptop.

The laptop is plugged in and in high-power mode, but I'll try a desktop also just to see what happens.

milo_mindbender:
So what I have is a main loop that begins with a piece of code something like if(time_since_last_run < 10ms) return;

When there is a stall there might be no packets sent for .13 seconds, then a [burst] of packets and acks will come very quickly in considerably less than .01 seconds.

Does this sound like anything you have encountered before? Am I missing some TCP parameters that could be tuned to make this work better?

Speaking generally, rather than Arduino in particular, it’s the sort of symptoms I associate with a network I/O function being called re-entrantly.

You can avoid re-entrancy using a static flag.

void foo(unsigned long time_since_last_run) {
  static bool busy = false;
  if (busy || (time_since_last_run < 10ms)) return;
  busy = true;

  // i/o stuff...

  busy = false;
  return;
}