Looks like a critical bug in Arduino Nano 33 IOT

I encountered a strange hang with getting data at relatively high rate from Arduino Nano 33 IOT. Now I isolate the problem and create a replicable sketch and server code (in node.js) as follows. You only need to run it for less than 1 minute to see a hang.

Server code:

var net = require("net");
var byteCount = 0;

const server = net.createServer((socket) => {
    console.log("connected");

    socket.on('data', function (data) {
        byteCount += data.length;
        if (byteCount >= 1024 ) {
            byteCount -= 1024;
            let reply = new Uint8Array(1);
            reply[0] = 1;
            socket.write(reply);
        }
        console.log(new Date().toISOString() + ": data coming in");
    });
})

var port = 8124;
var address = '192.168.137.1'; // change to your WLAN address

server.listen(port, address);

You can run this piece of code by putting it into a file (e.g. echo.js) and run with node echo.js from command line after installed node.js.

Arduino sketch

#include <WiFiNINA.h>

const unsigned long reportInterval = 10000; // change this to 100 000 will almost eliminate chances of hangs

int status = WL_IDLE_STATUS;             // the Wi-Fi radio's status
char ssid[] = "Yournetwork";                // your network SSID (name)
char pass[] = "password";                // your network password (use for WPA, or use as key for WEP)

byte buffer[2000];
IPAddress server(192, 168, 137, 1); // change to your server WLAN ip
WiFiClient client;

bool allowSend = true;
unsigned long previousMicrosReport = 0;

void connectToWifi() {
  // attempt to connect to Wi-Fi network:
  while (status != WL_CONNECTED) {
    Serial.print("Attempting to connect to network: ");
    Serial.println(ssid);
    // Connect to WPA/WPA2 network:
    status = WiFi.begin(ssid, pass);
  
    // wait 5 seconds for connection:
    delay(5000);
  }

  // you're connected now, so print out the data:
  Serial.println("You're connected to the network");
  Serial.println("---------------------------------------");
}

void connectToServer() {
   while (1) {
    if (client.connect(server, 8124)) {
      Serial.println("Connected to the server");
      break;
    }
    Serial.println("Can't connect to the server");
    delay(5000);
  }
}

void setup() {
  Serial.begin(9600);

  connectToWifi();
  connectToServer();
  Serial.println("Setup completed");
}

void checkOK() {
  //Serial.println("reading reply");
  uint8_t simplifiedReply = client.read(); // will hang here
  //Serial.println(got reply);
  if (simplifiedReply == 1) {
    allowSend = true;
  }
}

void loop() {
 if (!allowSend) {
    checkOK();
 }
 unsigned long currentMicrosInfo = micros();
 
 if (allowSend) {
  if (currentMicrosInfo - previousMicrosReport >= reportInterval) {
    previousMicrosReport = currentMicrosInfo;
    
    client.write(buffer, 1024);
    allowSend = false;
  }
 }
}

Should I file a bug on Github? Also, what are the alternative boards (e.g. ESP32?) I can choose to avoid this wired problem at this point? I am not a hardware engineer, so I apologize any naive or stupid questions. Thank you!

You're sending the content of an uninitialized variable, so probably only null bytes. Are you sure the server is doing what you expect it to do with this?

You don't check if the connection to the server is still open. The server might close the connection...

Hi pylon,

Thank you so much for the response.

  1. it is just a simplified echo server. As you can see from the code, the server don't care what are the bytes, it only reads 1024 bytes and send a one byte acknowledgment.

  2. I don't think the server has a problem. The server has been working with different client at high data rate without a problem. It also logs connection close event, which I haven't seen one in this case. So I think the server didn't close the connection, it is just a hang in the firmware of the little WLAN controller. You can check if the server closes the connection with the following updated server code:

var net = require("net");
var byteCount = 0;

const server = net.createServer((socket) => {
    console.log("connected");

    socket.on('data', function (data) {
        byteCount += data.length;
        if (byteCount >= 1024 ) {
            byteCount -= 1024;
            let reply = new Uint8Array(1);
            reply[0] = 1;
            socket.write(reply);
        }
        console.log(new Date().toISOString() + ": data coming in");
    });

    socket.on('close', () => {
        console.log('socket closed');
    })
})

var port = 8124;
var address = '192.168.137.1'; // change to your WLAN address

server.listen(port, address);

Moreover, I did some soldering yesterday, and damaged some pins. Now the situation get worse, it hangs after only 3-4 transmissions (~30-40ms). So it might be the board was faulty...I will get a new board to see how it works.

An echo server returns the same content it received. Your code waits until 1024 bytes are received and then returns a single byte. If a single byte gets lost your setup stops working. I know, TCP theoretically guarantees that the bytes are transferred but I would use a more failure tolerant protocol.

I would insert more debugging code. Log on the server how many bytes you already received. Print out on the client what state it is actually in (for example waiting for server response).

Are you sure data.length() counts the bytes received? A Javascript string contains UTF-16 characters. The length() method returns the number 16bit values but I have no clue how the bytes received on the socket are translated into these 16bit values. What if not only null bytes are transferred?

Hi pylon,

Thanks for keeping providing advice and suggestions. I appreciate that.

  1. I don't think the byte was lost. TCP should gaurantee that. If a byte lost, it should re-transmit, otherwise it is a bug in TCP stack. It is not a re-transmission or signal problem, I'll explain later.

  2. Sorry for using the wrong echo server term. Just to clarify, data here is not a javascript string, but a Node.js buffer object. data.length will give the correct chunk length.

  3. I have done a lot debugging, I just cleared everything but kept the most important debugging output. In the following code, when a hang happens, I can only see "reading reply" and no "got reply" follows. Normally, it only takes ~300 microseconds for client.read() to return. If there is nothing to read, it returns 255 (-1 if you use int8_t)

I know you will say it is a bad practice to call client.read() without confirming if there is anything to read. Well, it was what I was doing in the first place

In that link, I found the hang was in client.available(). Later I changed to client.read() which has the same problem. So I am 80% confident it was a bug in the wifi firmware or somewhere in the hardware.

The Arduino communicates serially with the NINA module so the data loss may be on that line. Can you try to send more than one byte back?

Hi pylon,

Thank you for the reply, however, I am sorry I don't see your point. It is not a hang for not receiving a byte blocking the protocol. If the byte lost, the read() function has to return anyway. It stuck in a function that should not hang in all conditions.

My protocol is a hand shake protocol which means an acknowledgment is predefined. You said to try send more bytes, does that mean to have a longer message as the acknowledgment? That will not work, because assume if a byte was lost, then the acknowledgment wouldn't complete, same with one byte acknowledgment situation. Or you want to have an error correction code implemented? That's too complicated. After all, it has nothing to do with the protocol layer in this problem.

I don't say that the protocol layer is wrong. I try to debug your code. If you send more than one byte you might see how many bytes are received. If you send only one byte you don't know if that byte was lost or if nothing at all was sent. If you have better ideas how to track down that, I don't insist on my way.

Thank you so much for trying to debug. Please take your time let me know if you find something. I'll also try your ways in a day or two, and post here if I find anything else.

I cannot debug myself, I don't own your hardware.