Forgotten the mention "my biggest problem" on ETH (HTTP) performance, see link:
Portenta ETH (HTTP) speed
I cannot let TCP socket open
All works fine as: wait for client connected, get the 64 KB request, process it, send back a 64 KB response AND close the connection (via client.stop()).
But if I want to let open the established client connection:
after the first large request processed - the second one starts to come in - but it does not progress. After some chunks received (again for 64 KB) - nothing is received. All hangs. My expected 64 KB request is not completed. It looks like one max. chunk is smaller as expected and nothing received anymore afterwards. The ETH Rx hangs.
I assume this:
When I do not close the connection on FW (server) side, via client.close() - the buffers are not flushed, reset to pointer start, even I try to make sure to drain all from ETH receiver. Another request coming in WITHOUT to close connection does not seem to be able to keep going to receive. The buffers might hit an "unknown end" or the HW needs a properly clean-up after the previously completed response. Or the buffer management overflows (even it should be a RingBuffer), the pointers are "hanging" outside a valid region ...
No idea! And no idea how to debug!
There is not any debugger on Portenta H7 where I could see where it hangs, to debug also the API code, the "Open Source" code, to see if it is looping and where.
I am blocked by this issue.
Why not closing the connection?
Before "you" or the Arduino team steps in here and asks me: "why you do not close the connection? If it works fine with closing all the time - why 'you' do not use it this way?"
Here my concerns:
-
OK, a Web Browser closes after every request and response the connection. A new request is a new socket connection (with a new TCP client port number used). This works fine.
-
But: every new (TCP or even UDP) socket created uses a new client port. The network stacks on Windows, Linux, MacOS "assume": even a socket is closed - there can be still data coming in. They make sure to create a new socket with a new, different client port number, e.g. port number incremented by one. They let the previous socket (with the previous client port number) sit there as a "zombie". Maybe there was a huge delay or server too late, let client realize to which socket it belongs.
So, they don't reuse the previous client port number, they allocate a new one instead, on every new socket connection even to the same server again. Such a "zombie" socket remains there for a long time. Some documentations say: up to 15 minutes or even longer. It means: after 64K port numbers used already (all 65536 port numbers are "zombies now") - the network stack on client side gets out of "free" ports and will reject to connect a new socket (e.g. in Python script).
So,
even I do all this "open - do - close" but so fast, e.g. in a loop (my Python script sends 64K numbers of such requests) - and all happens within this "zombie elapsing time window" - the client gets out of available ports and all stops.
So,
I have to let open the connection. Just if one side (mainly the client, not my MCU server) will kill the connection then I want to close the socket.
This makes sure, that fast, repeated requests never run out of TCP client ports (and client would stop working).
So,
I need this is in my Python script. In and from a web browser it might never happen because the user is so slow to request a new URL (which is a new connection and new socket).
But if my Python script fires my SPI transaction in a test script loop, e.g. to do 64K SPI transactions as fast a possible - the client runs out of ports and all is dead.
When this situation happens - you do not have any chance to recover. You have to wait until this "zombie socket time period" has elapsed (up to 30 minutes later!).
If you do not trust me: use netstat tools on client side and fire fast requests. See how many "zombie sockets" remain there and for how long.
Unfortunately, this keep socket open does NOT work in mbed API.
And I have not any idea how to find root cause - e.g. to contribute to Arduino and community, esp. no idea how to debug (due to missing debugger feature).
BTW: this approach works completely fine on the same STM32H745 MCU (NUCLEO-H745-ZI-Q) board, but w/o mbed and as native HAL and CMSIS RTOS, LwIP stack etc.
Other suspicion
Maybe nobody has complained before (due to "my strange" large requests and fast reuse of open socket). But it "tells" me this: I cannot trust the API, the function, their implementation and if it is well tested for "all" (usual) use cases.
The board and Arduino claim that Portenta H7, mbed etc. are intended for industrial use. But if I hit this issue immediately when trying to use for our industrial purpose - I am a bit skeptical about "all the unknown bumpers ahead of us".
I would have feeling as: if promoted for industrial use and for reliable IoT solutions but such a simple requirement like "keep connection open" (for performance and resource reasons) brings it it to fail... how could I rely on it? Sorry.
Anybody with a test/use case?
So, does anybody has experience and a test case for this scenario?
-
send (and receive) very large HTTP requests (via TCP sockets), 64K in both directions
-
after client has connected and socket is established - do not close the socket:
drain all received characters from ETH receiver but reuse the existing socket (let it open on both sides) -
let the socket open and use it again and again: just close it if FW realizes that client has been
disconnected ( if client.connected() should realize, I hope, when client has closed) or
special request sent to let close the connection -
send such requests as fast as possible, e.g. from a looping Python script, sending requests,
waiting for response but w/o any artificial delay (time.sleep())
Does this use case works for half an hour and even endlessly?
(for me it works just for 1 and a half request, dead in 5 seconds)