MKR1010 wifi disfunctional. Better options?

I have a couple of MKR1010s, both doing wifi. One uses UDP and works quite well. The other uses TCP and opens a connection to another computer in the house, and it's awful. Randomly, the TCP connection fails for no apparent reason. Once it does, it takes a reboot to make connections work again.

I've written a fair amount of code to detect when the connection has gone bad, and reboot the board when it does. But for this application that's not an acceptable workaround. I need TCP to work reliability for days on end. It does for everything else in the house.

One site suggested going to wifinina firmware 1.2.1. I tried this and it seems to have lessened the disconnects, but not eliminated them.

A few things to note: I have decades of experience with TCP, and quite a lot of systems in my house use it successfully for all sorts of things. It's only the 1010 that's dropping connections. Also, the device is about 15' from the wireless access point it uses. I am not having range issues. Finally, the volume of data is low, a few bursts of a few dozens of bytes at most per minute, not thousands. It's not a bandwidth issue.

So my questions are:

  1. Is there some small form factor arduino that does this better?
  2. has anyone else found a workaround for this board?
  3. failing all else, can some point me to source code for the TCP implementation? Maybe I just need to fix someone's library.

scottmayo:
I need TCP to work reliability for days on end. It does for everything else in the house.

Everything else is just reconnecting without you noticing.

scottmayo:
Once it does, it takes a reboot to make connections work again.
...
2) has anyone else found a workaround for this board?

You do not need to reset the board; you just need to restart the WiFi connection. The rest of your application can continue.

Have a look into the example in reply #30 in the following post.

I wrote it for the Arduino Nano 33 IoT which is a smaller version of your board without battery management. I have the code running for many months now. There is an average of 10 reconnects every day without any issues.

Klaus_K:
Everything else is just reconnecting without you noticing.

No, it is not. I wrote all the code involved and I get notifications when any of dozens of connections fail. On a local network, TCP connections close when you tell them to or when hardware fails and writes time out. An implementation of TCP that fails over a working network path with available bandwidth isn't TCP. All the other platforms I use (including cheaper ones) have proper implementations of TCP and no issues with failed connections.

Thanks for the example, and I'm sure it works well for you, but in this case needing to reconnect creates an interval where the device can't send a message within the allotted window. If "oh, well, TCP just doesn't work to spec on this platform" is the answer, I will look for other platforms. Thanks.

This may be obvious but since you don't mention it - have you tried an alternative MKR1010 ? If the one you are using is faulty (dry joint, termperature sensitive, whatever) it could show the symptoms you describe.

Likewise the power supply.

countrypaul:
This may be obvious but since you don't mention it - have you tried an alternative MKR1010 ? If the one you are using is faulty (dry joint, termperature sensitive, whatever) it could show the symptoms you describe.

Likewise the power supply.

Power supply is good (5.09v at 3A, more than double the peak draw). The fact that other people are freely stating that TCP connections randomly drop, and that the problem gets better or worse under different firmware versions, makes me believe the problem isn't hardware. The device is in a controlled environment anyway.

Ultimately I'm looking for a small form processor with working TCP and UDP, ADC (suitable for use with Touch detection) that doesn't get corrupted by power failures, as a Raspberry Pi can. I just wrote a review on Amazon warning people about the problem; I'll take it down when I see a firmware update that makes TCP work as specified.

So that's it? TCP just doesn't work to spec and no one is on about this? Why is this still a product? False advertising at the very least.

scottmayo:
So that's it? TCP just doesn't work to spec and no one is on about this? Why is this still a product? False advertising at the very least.

TCP is a transport layer protocol. It is much more likely the issue comes from the layers below. There is nothing the TCP stack can do when the physical link is lost. The WiFi module has a small chip level antenna. The 2.4GHz band is full of other devices. It is in the nature of wireless links that they are less reliable than wired links, otherwise we would not place data wire everywhere on the planet.

Maybe the firmware should have an option to reconnect automatically. How much time does you TCP connection allow before a time out error?

Klaus_K:
TCP is a transport layer protocol. It is much more likely the issue comes from the layers below. There is nothing the TCP stack can do when the physical link is lost. The WiFi module has a small chip level antenna. The 2.4GHz band is full of other devices. It is in the nature of wireless links that they are less reliable than wired links, otherwise we would not place data wire everywhere on the planet.

Maybe the firmware should have an option to reconnect automatically. How much time does you TCP connection allow before a time out error?

The physical ink isn't lost. The radio is on, and if there are problems with interference or any other signal disruption, TCP does re-transmits. That's the point of the protocol. You're describing UDP, which is allowed to drop packets for any reason.

Moreover, I have a fair number of wireless devices in the house, all of them running TCP connections with code I have written. Those connections do not break. There is exactly one device here that cannot manage to keep a TCP connection open, and that's the only arduino attempting TCP. The access point is not far away, the network in general is lightly loaded, and I've used other devices with chip antennas, like pi zero-ws, that have no problem with TCP., even over much longer distances and with heavier traffic.

If you're making the claim that the wifi chip is so resource constrained that it cannot manage TCP retransmits correctly, great. It shouldn't advertise TCP support. But it did, and for $44 I expect it to work as well or better than a $10 raspberry pi.

I don't know what the TCP implementation is doing under the covers. I establish the connection in setup(), and check isConnected() in loop(). If it returns false, or if the occasional very small write() i do (6 bytes every 5 seconds) fails, I stop() the connection and reboot, because calling connect() again doesn't work.

Is this at all helpful, seems others have had similar problems? [SOLVED] Problem with arduino mkr1010 and erratic connection loss - #15 by jotathebest - IoT Devices - Ubidots Community

countrypaul:
Is this at all helpful, seems others have had similar problems? [SOLVED] Problem with arduino mkr1010 and erratic connection loss - #15 by jotathebest - IoT Devices - Ubidots Community

I saw that. But the code only opens one socket, and when that connection fails I call stop() on it and reboot the device. There is never more than one socket in existence.

You mentioned trying wifinina firmware 1.2.1, I know some forum members had problems with versions before 1.4. I have no idea on how easy/difficult it is to upgrade Wifinana but if it is relatively easy is it worth trying?

Hi, I have the same problem with the WiFi connection. Sometimes the connection runs for over 10 hours and then suddenly stops.
I was using a WiFi booster via powerline (next to the MKR1010, and sometimes after the wifi connection stopped, I reset the WIFI booster and the connection came back on. (Fortunately my sketch loop keeps running).
Thereafter I connected directly to the WiFiRouter (around -77dBm.
Same problem. Connection is sometimes up for over more then 10 hours and then stops.
I have two counters build in. 1 counts the number of loops and 1 counts the number of resets.

I have tried the Power Down function of the NiNamodule, but this seems not to work, No good effects were noticed.
So now I use the De_INit of the NiNa module. Unfortunately also no stable results.

Mostly it takes 1 loop to have connection with the WiFiRouter.

WiFi FW is 1.3.0 .
If anyone has a solution, it is most welcome.

See my sketch below. (Not all of the sketch is shown)
Note: Sketch does not wait till connection with router is established, because my loop should always run, even if wifi connection is not correct after powerfailure or wificonnection loss. Normally these two conditions are met.
Leds are used to indicate status of the connection process and this normally gives a good and clear indication.

int LOOPS;
int RESETS;

void setup()
{
Serial.begin(9600);
delay (5000);

LOOPS = 0;
RESETS = 0;
}

void loop()

{
Serial.println ("");
Serial.println ("START LOOP =========================================================== TEST 16 v41 ");
Serial.println("");

LOOPS = (LOOPS + 1);
Serial.print("Number of LOOPS is : ");
Serial.println(LOOPS);
Serial.print("Number of RESETS is : ");
Serial.println(RESETS);
Serial.println("");

ConnectWiFiRouter();

ConnectToNTP(); // details in set-up and declarations not shown here in this example

ConnecttoAmazon();
GordijnenOpen(); // following voids are for my project and not further shown

ConnecttoAmazon();
GordijnenDicht();

ConnecttoAmazon();
DelayLoop();
}

void ConnectWiFiRouter()
{
WiFiDrv::analogWrite(red, 1);
Serial.println("CONNECT TO WiFi ROUTER, send SSID and PASSWORD");
WiFi.begin(SSID, PASSWORD);
status = WiFi.begin(SSID, PASSWORD);
delay(100);
Serial.print("1. WiFi Status must be 3 and is : ");
Serial.println(status);

if (status != (3))
{

Serial.println ("No connection with WiFiRouter");
Serial.println ("");
Serial.println ("Disconnect from WiFiRouter via : wifiDriverDeinit ; ");

WiFiDrv::wifiDriverDeinit(); // De-Init

WiFiDrv::analogWrite(red, 16); // LED was set to 0 by De-Init
goto Z;
}
Serial.println ("Connected to WiFiRouter. No De-Init DONE ");
Serial.println("");
goto STATUSOK;

Z:
Serial.print("2. wifiDriverDeInit DONE. New attempt to connect WiFi : Status must be 3 and is : ");
RESETS = (RESETS + 1);
WiFi.begin(SSID, PASSWORD);
status = WiFi.begin(SSID, PASSWORD);
Serial.println(status);
if (status != (3))
{
Serial.println("No good result from De-Init");
Serial.println("");
goto END;
}

STATUSOK:
Serial.println("STATUS OK");
Serial.println("");
WiFiDrv::analogWrite(red, 0);
WiFiDrv::analogWrite(green, 32); // 1 green flash of 2 seconds
delay (2000);
WiFiDrv::analogWrite(green, 0);

END:
PrintWifiStatus();
}

void ConnectToNTP()
{
timeClient.begin();
timeClient.setTimeOffset(3600);
timeClient.update();

Serial.println ("");
epochTime = timeClient.getEpochTime(); // Unsigned Long This time is the number of seconds that have passed since 1 January 1970 00:00 UTC
Serial.print("Epoch Time: "); // 51x365x24x60x60 = 1,608.336.000
Serial.println(epochTime);

currentHour = timeClient.getHours();
Serial.print("Hour: ");
Serial.println(currentHour);

currentMinute = timeClient.getMinutes();
Serial.print("Minutes: ");
Serial.println(currentMinute);
Serial.println ("");
}

void ConnecttoAmazon()
{
Serial.println ("CONNECT TO AMAZON");

if (status != (3))
{
Serial.println ("No Connection with WiFiRouter. Connection to Amazon Server SKIPPED");
goto Z;
}

(client.connect(server, 443));

Serial.print ("Amazon reponse must be 1 and is : ");
Serial.println (client);

if (client != (1))
{
Serial.println ("NOT Connected to Amazon Server");
Serial.println ("");
goto Z;
}
Serial.println ("Amazon Server Connected : 2 green flashes of 0,5 sec");
WiFiDrv::analogWrite(green, 8);
delay (300);
WiFiDrv::analogWrite(green, 0);
delay (100);
WiFiDrv::analogWrite(green, 8);
delay (300);
WiFiDrv::analogWrite(green, 0);
delay (300);
Z:
delay (0);
}

Findings:

TCP connections on a local wifi network, connecting to local network services, with no interference problems fail randomly, over a period of minutes to hours. This happens in very low traffic situations (6 byte messages a few times a minute) with only one connection open.

When a connection collapses, my application retries the connection (stop(), connect()). Sometimes this succeeds, and when it does work it succeeds quickly. When it doesn't, the application keeps trying until a watchdog reboots the board. Once rebooted, the connection work again for a time.

I don't know where to find the sources for the firmware (and if someone knows where they are, I'm willing to look into this). My best guess is that there is a race or memory corruption in the library or firmware. I can rule out network problems as I have a lot of other single board systems on the house network and none of them are having problems. It's only this one arduino (the only one I have using TCP) with issues. UDP seems to work fine so I don't believe it's a hardware or signal strength issue.

I have a suspicion. I always compile with -Wall, which turns up a lot of warnings in the associated libraries I use. Some of them are trivial (people misusing #if), but one is a complaint about some ring buffer function falling off the end without returning a value (which means it is returning a value, just a garbage one). I haven't investigated, but I could easily imagine a TCP implementation using a ring buffer, and if the ring buffer has problems, so would TCP.

The source code for the firmware that runs on the ESP32-based ublox NINA-102 WiFi modem on the MKR WiFi 1010 is here:

Of course there is also the Arduino sketch firmware that runs on the primary ATSAMD21G18 microcontroller on the MKR WiFi 1010, but I'm guessing you already know where that is on your computer. If not, I can help you get at that as well.

So far, my sketch runs already for days without problems.
10631 LOOPs
331 RESETS
Connections comes up correctly after being lost.

PeterKDam:
So far, my sketch runs already for days without problems.
10631 LOOPs
331 RESETS
Connections comes up correctly after being lost.

So what did you change?

This topic was automatically closed 120 days after the last reply. New replies are no longer allowed.