Arduino SIM900 hang during an HTTP file download

We've been happily running an Arduino 2560 with a SIM 900 GSM shield to communicate with our server.
So far the data-transfers were very limited, small HTTP POSTs with a very limited data set being sent back and forth.

However, lately we needed to download a file over HTTP using the SIM900 and we ran into some issues.

We're sending the standard CIPSTART/CIPSEND commands to initiate a TCP connection.

AT+CIPSTART="TCP","ourserver.com","80"OKCONNECT OK
AT+CIPSEND=117

We then execute an HTTP GET, send some request headers and wait for the response

GET /endpoint/firmware/download HTTP/1.1
HOST: ourserver.com
User-Agent: Arduino

Once we receive the response, we can see from the content-length how much bytes we need to read.

We're using code like this:

while(_serial->available() > 0){
                char c = _serial->read();
               char hex[2];
               sprintf(hex,"[%02x]", static_cast<unsigned char>(c));
               Serial.print(hex);
               if((bodycounter+1) % 16 == 0 )
                 Serial.println(String("Bytes :  ") + String(bodycounter));
              crc32(oldcrc32,c, len);
              dataFile.print(c);
}

This results in the logging below . As you can see, we simply print out a hex representation of the bytes that we're processing, and also the number of bytes we've processed.

The issue is that somewhere in that processing, the system hangs. It simply stops processing. It stays like that and never continues / recovers.

[bb][81][c9][5c][d1][40][28][17][39][07][4a][07][5b][07][41][f4]Bytes :  37231
[00][e0][21][15][31][05][41][05][51][05][19][f0][01][e0][01][c0]Bytes :  37247
[00][e0][ce][01][86][58][9f][4f][0e][94][4d][94][c3][53][de][4f]Bytes :  37263
[a8][81][b9][81][cd][5c][d1]

The behavior is pretty random :

  • It sometimes hangs very early on (after reading a couple of thousand bytes)
  • it can hangs somewhere in the middle (50%)
  • it can happen at 99%
  • Sometimes it doesn't hang at all and it processed all 79532 bytes without an issue.

Any idea what might be causing this ? We initially thought it might be because the server was serving the file too fast and the SIM900 couldn't keep up, but even by throttling the server (customizing both the packet-size and the delays between packets) we still ran into these hangs.

Any idea what might be causing them ? We're always testing with the same file.

I did monitor the network traffic on the server using tcpmon, and got the following logging when using curl or telnet from a PC to download the file

23:15:13.083324 IP 188.188.111.100.52896 > CentOS-63-64-minimal.http: Flags [.], ack 2, win 33304, options [nop,nop,TS val 739980698 ecr 2331250456], length 0
	0x0000:  4500 0034 3fb9 4000 3106 c7f5 8d86 e4d8  E..4?.@.1.......
	0x0010:  904c 3f6a cea0 0050 113c 992c cd39 9a77  .L?j...P.<.,.9.w
	0x0020:  8010 8218 d5c2 0000 0101 080a 2c1b 359a  ............,.5.
	0x0030:  8af4 0f18

With the Arduino / SIM 900 I noticed there were no TCP options in the TCP packets (no nop, TS val, ecr). But I don't know how important those are.

23:19:43.190626 IP 188.188.111.211.26348 > CentOS-63-64-minimal.http: Flags [.], ack 42900, win 2161, length 0
	0x0000:  4508 0028 0034 0000 3306 8b4e bcbc 6fd3  E..(.4..3..N..o.
	0x0010:  904c 3f6a 66ec 0050 002c f7f6 322e bd67  .L?jf..P.,..2..g
	0x0020:  5010 0871 5c28 0000 0000 0000 0000       P..q\(........

Any ideas why the Arduino / SIM 900 would simply hang during this file download ?

I don't know how the SIM900 works, but this could cause the problem.

while(_serial->available() > 0){

If the SIM900 doesn't get all the response at one time from your provider, it could run out of characters to send to your sketch, and this loop would exit. Normally you would need to wait until the server closes the connection to determine when the server is finished, but I don't know how the SIM900 determines that.

SurferTim:
If the SIM900 doesn't get all the response at one time from your provider, it could run out of characters to send to your sketch, and this loop would exit. Normally you would need to wait until the server closes the connection to determine when the server is finished, but I don't know how the SIM900 determines that.

It never exits the loop (we have logging after the loop that is never shown). So it's stuck somewhere indefinitely inside that loop.

The server is serving out this file really quickly as you can see from the following logs :

2014-01-18 13:37:14,003 INFO  FirmwareService - downloadFirmware request for version 1.1 by 206205007318884
2014-01-18 13:37:14,003 INFO  FirmwareService - downloadFirmware done

The SIM900 on the other hand is configured to run at 4800bps, so he needs a lot more time to process all these bytes.
All these bytes are coming in through the serial line and are getting processed, but it simply hangs after a while.

If you want to see what the server is sending, get rid of the extra stuff and use this:

while(_serial->available() > 0){
   char c = _serial->read();
   Serial.write(c);
 }

edit: Is this a binary file you are downloading to the Arduino? If so, you must get the "Content-Length" parameter from the server response and read that many bytes after the blank line (double cr/lf).

Yes it's a binary file. That's the reason why we're converting the bytes to hex before printing them out as the file contains a lot of characters that don't translate well via a Serial.write.

We are reading the content-length of the server response and taking that into account.
We also specify our own timeout in the while loop.

So it's something like this :

uint32_t oldtime = millis();
do{
while(_serial->available() > 0){
   if(bodycounter < bodylength){
      // process the bytes
   }
 }}while(millis() - oldtime < (4 * 60 * 1000));

We are sometimes able to download the entire file and process it correctly. Like I said there's also a CRC32 check done on the bytes and that also checks out.

That makes me think that code-wise everything is ok.

My idea is that the fact that we have a very fast and powerfull backend combined with a very slow and memory-limited Arduino (in comparison) that's causing the issue. But I don't have a solution for it yet...

I see now. Are you parsing the server response for the "Content-Length" parameter? That will tell you how many bytes the server will send after the blank line. Don't depend on the serial available to determine the server is finished sending. That could be incorrect.

edit: The timeout feature is a good idea. Use that also.

We are parsing the response correctly (best way we can as you can see from our verbose logging .... it's pretty low-level). See our log dump for more details.

We are able to retrieve the Content-Length from the HTTP response header and we know when we should start processing the bytes.

During the actual processing of the bytes (writing them in the serial monitor) I don't see anything that could cause if to crash.
There are also no memory issues at that point (memory usage is very stable during processing).

The problem is that it simply hangs during that processing.... it never exits the while loop and it never times out.

TRACE : 01/01/2000 00:00:48 : MEM / 1601 : Modem AT Response : GET /firmware-1.1.bin HTTP/1.1HOST: ourserver.comUser-Agent: Arduinoiccid: 111111007318884SEND OK
DEBUG : 01/01/2000 00:00:48 : MEM / 3234 :  +++++++++++++++ start readBodyContent +++++++++++++++
[0d] 
[0a] 
HTTP/1.1[space]200[space]OK[0d] 
[0a] 
Date:[space]Sat,[space]18[space]Jan[space]2014[space]13:26:55[space]GMT[0d] 
[0a] 
Server:[space]Apache/2.2.15[space](CentOS)[0d] 
[0a] 
Last-Modified:[space]Sat,[space]18[space]Jan[space]2014[space]13:02:52[space]GMT[0d] 
[0a] 
ETag:[space]"38c0c09-136ac-4f03e46b8d700"[0d] 
[0a] 
Accept-Ranges:[space]bytes[0d] 
[0a] 
INFO  : 01/01/2000 00:00:48 : MEM / 3031 : Bodylength = 79532
Content-Length:[space]79532[0d] 
[0a] 
Connection:[space]close[0d] 
[0a] 
Content-Type:[space]application/octet-stream[0d] 
[0a] 
[0d] 
[0a] 
[0c][94][c8][0d][0c][94][ce][84][0c][94][ff][84][0c][94][30][85]Bytes :  15
[0c][94][61][85][0c][94][92][85][0c][94][c3][85][0c][94][f4][85]Bytes :  31
[0c][94][25][86][0c][94][f5][0d][0c][94][f5][0d][0c][94][f5][0d]Bytes :  47

Also, we noticed that we had to limit the baudrate of the sim900 in order to get correct CRCs.
Setting the baudrate too high resulted in a lot of CRC errors.

With a baudrate of 9600 we've had limited success in the sense that when the download is OK the CRC also checks out fine. However we also get these hangs that are very annoying.

You are overflowing the hex array. It must be large enough to hold the entire string from the sprintf() call plus the terminating zero. Something like this:

while(_serial->available() > 0){
                char c = _serial->read();
               char hex[5];
               sprintf(hex,"[%02x]", c);
               Serial.print(hex);
               if((bodycounter) % 16 == 0 )
                 Serial.println();
}

Even without the hex logging the system hangs from time to time.
Doesn't the [%02x] specify a width of 2 that is to be printed into the hex array ?

With your code changes I get the following output. (notice how it hangs at the end).

And thanks for taking the time to help me out with this ... much appreciated.

0c][ff94][ffc8][0d][0c][ff94][ffce][ff84][0c][ff94][ffff][ff84][0c][ff94][30][ff85]Bytes :  15
[0c][ff94][61][ff85][0c][ff94][ff92][ff85][0c][ff94][ffc3][ff85][0c][ff94][fff4][ff85]Bytes :  31
[0c][ff94][25][ff86][0c][ff94][fff5][0d][0c][ff94][fff5][0d][0c][ff94][fff5][0c]Bytes :  47
[ff94][fff5][0d][0c][ff94][fff5][0d][0c][ff94][fff5][0d][0c][ff94][fff5][0d][0c]Bytes :  63
[00][49][4e][46][4f][20][20][3a][20][00][49][4e][46][4f][20][20]Bytes :  79
[3a][20][00][20][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b]Bytes :  95
[2b][2b][2b][20][65][6e][64][20][63][6c][6f][73][65][20][63][6f]Bytes :  111
[6e][6e][65][63][74][69][6f][6e][20][2b][2b][2b][2b][2b][2b][2b]Bytes :  127
[2b][2b][2b][2b][2b][2b][2b][2b][00][54][43][50][20][43][6c][6f]Bytes :  143
[73][65][20][6e][6f][74][20][4f][4b][2e][2e][2e][2e][00][20][2b]Bytes :  159
[2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][20][65]Bytes :  175
[6e][64][20][63][6c][6f][73][65][20][63][6f][6e][6e][65][63][74]Bytes :  191
[69][6f][6e][20][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b]Bytes :  207
[2b][2b][2b][00][73][68][75][74][64][6f][77][6e][20][4f][4b][2e]Bytes :  223
[2e][2e][2e][00][20][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b]Bytes :  239
[2b][2b][2b][2b][20][65][6e][64][20][63][6c][6f][73][65][20][63]Bytes :  255
[6f][6e][6e][65][63][74][69][6f][6e][20][2b][2b][2b][2b][2b][2b]Bytes :  271
[2b][2b][2b][2b][2b][2b][2b][2b][2b][00][75][6e][64][65][66][69]Bytes :  287
[6e][65][64][20][62][75][74][20][73][68][75][74][64][6f][77][6e]Bytes :  303
[20][4f][4b][2e][2e][2e][2e][00][54][43][50][20][73][74][61][74]Bytes :  319
[75][73][20][77][61][73][20][63][6f][6e][6e][65][63][74][69][6e]Bytes :  335
[67][20][6f][72][20][63][6f][6e][6e][65][63][74][20][6f][6b][20]Bytes :  351
[62][75][74][20][63][6c][6f][73][65][20][66][61][69][6c][65][64]Bytes :  367
[00][20][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b]Bytes :  383
[2b][20][65][6e][64][20][63][6c][6f][73][65][20][63][6f][6e][6e]Bytes :  399
[65][63][74][69][6f][6e][20][2b][2b][2b][2b][2b][2b][2b][2b][2b]Bytes :  415
[2b][2b][2b][2b][2b][2b][00][77][65][20][63][6c][6f][73][65][64]Bytes :  431
[20][54][43][50][20][61][6e][64][20][73][68][75][74][64][6f][77]Bytes :  447
[6e][20][4f][4b][2e][2e][2e][2e][00][20][2b][2b][2b][2b][2b][2b]Bytes :  463
[2b][2b][2b][2b][2b][2b][2b][2b][2b][20][65][6e][64][20][63][6c]Bytes :  479
[6f][73][65][20][63][6f][6e][6e][65][63][74][69][6f][6e][20][2b]Bytes :  495
[2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][00][54]Bytes :  511
[43][50][20][41][6c][72][65][61][64][79][20][63][6c][6f][73][65]Bytes :  527
[64][20][61][6e][64][20][73][68][75][74][64][6f][77][6e][20][4f]Bytes :  543
[4b][2e][2e][2e][2e][00][20][2b][2b][2b][2b][2b][2b][2b][2b][2b]Bytes :  559
[2b][2b][2b][2b][2b][2b][20][73][74][61][72][74][20][63][6c][6f]Bytes :  575
[73][65][20][63][6f][6e][6e][65][63][74][69][6f][6e][20][2b][2b]Bytes :  591
[2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][00][20][2b]Bytes :  607
[2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][20][65]Bytes :  623
[6e][64][20][72][65][61][64][42][6f][64][79][43][6f][6e][74][65]Bytes :  639
[6e][74][20][66][61][69][6c][20][77][68][79][20][2b][2b][2b][2b]Bytes :  655
[2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][00][20][2b][2b][2b]Bytes :  671
[2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][20][65][6e][64]Bytes :  687
[20][72][65][61][64][42][6f][64][79][43][6f][6e][74][65][6e][74]Bytes :  703
[20][66][61][69][6c][20][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b]Bytes :  719
[2b][2b][2b][2b][2b][00][20][2b][2b][2b][2b][2b][2b][2b][2b][2b]Bytes :  735
[2b][2b][2b][2b][2b][2b][20][65][6e][64][20][72][65][61][64][42]Bytes :  751
[6f][64][79][43][6f][6e][74][65][6e][74][20][73][75][63][63][65]Bytes :  767
[73][73][20][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b]Bytes :  783
[2b][2b][00][20][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b][2b]Bytes :  799
[2b][2b][2b]

Doesn't the [%02x] specify a width of 2 that is to be printed into the hex array ?

It specifies that it will print two characters between the braces. You are not allowing for the braces or the terminating zero. That would be a total of 5 characters.

edit: Make that 7 characters for the hex array. Note these in your display:

[ff94]

That is 4 characters between the braces plus the terminating zero would be 7 characters. Still overflowing.

True ... changed the code ... but it still hangs, and for some bytes it prints 4 characters inside the braces.

[ff94][ffce][ff84][0c]

But nice catch !

Did you change the hex array to 7 characters?

char hex[7];

I tried it with 7 character hex array and also ran the test without displaying the hex bytes. ( only displayed a line for every 512 bytes processed).

On both occasions I had a hang.

I have a family commitment now so I can't continue testing but I'll pick it up when I get back and update this post if I find something.

Thanks again for the help so far !