BLE.poll() crash

Hello All,

I am trying to set up a device on an IoT Nano that uses the following libraries:

  • Adafruit NeoPixels / NeoPixelBus
  • Arduino BLE
  • RTCZero
  • SD
  • Servo
  • WiFiNINA

However, while doing some testing I have been experiencing sporadic crashes where the board completely freezes and requires resetting. After painfully tracking where the crash was happening by having the code print its location frequently I managed to pinpoint the crash to the BLE.poll() function. I then took a look at the HCI.poll() function and found that most of them were occurring at the end of the while (HCITransport.available()) loop but before the end of the poll function which doesn’t have anything. I have added the edited function below with my debug edits. The IoT seems to freeze after “WE” but doesn’t send the “A” which I added to the HCITransport.available() function or the “WS” from the start of the while.

void HCIClass::poll(unsigned long timeout)
{
		//Edit
	  if (_debug) {
	_debug->print("PS ");
	  }
	//
#ifdef ARDUINO_AVR_UNO_WIFI_REV2
  digitalWrite(NINA_RTS, LOW);
#endif

  if (timeout) {
	//Edit
	  if (_debug) {
	_debug->print("T ");
	  }
	//
    HCITransport.wait(timeout);
  }

  while (HCITransport.available()) {
	//Edit
	  if (_debug) {
	_debug->print("WS ");
	  }
	//
    byte b = HCITransport.read();
	//Edit
	/*  if (_debug) {
	_debug->print("R: ");
	_debug->print(b);
	_debug->print(" ");
	  }*/
	//
    _recvBuffer[_recvIndex++] = b;
	//Edit
	  if (_debug) {
	_debug->print("I:");
	_debug->print(_recvIndex);
	_debug->print(" ");
	  }
	//
    if (_recvBuffer[0] == HCI_ACLDATA_PKT) {
      if (_recvIndex > 5 && _recvIndex >= (5 + (_recvBuffer[3] + (_recvBuffer[4] << 8)))) {
        if (_debug) {
          dumpPkt("HCI ACLDATA RX <- ", _recvIndex, _recvBuffer);
        }
#ifdef ARDUINO_AVR_UNO_WIFI_REV2
        digitalWrite(NINA_RTS, HIGH);
#endif
        int pktLen = _recvIndex - 1;
        _recvIndex = 0;

        handleAclDataPkt(pktLen, &_recvBuffer[1]);

#ifdef ARDUINO_AVR_UNO_WIFI_REV2
        digitalWrite(NINA_RTS, LOW);  
#endif
      }
    } else if (_recvBuffer[0] == HCI_EVENT_PKT) {
      if (_recvIndex > 3 && _recvIndex >= (3 + _recvBuffer[2])) {
        if (_debug) {
          dumpPkt("HCI EVENT RX <- ", _recvIndex, _recvBuffer);
        }
#ifdef ARDUINO_AVR_UNO_WIFI_REV2
        digitalWrite(NINA_RTS, HIGH);
#endif
        // received full event
        int pktLen = _recvIndex - 1;
        _recvIndex = 0;

        handleEventPkt(pktLen, &_recvBuffer[1]);

#ifdef ARDUINO_AVR_UNO_WIFI_REV2
        digitalWrite(NINA_RTS, LOW);
#endif
      }
    } else {
      _recvIndex = 0;

      if (_debug) {
        _debug->println(b, HEX);
      }
    }
		//Edit
		  if (_debug) {
	_debug->print("WE ");
		  }
	//
  }

#ifdef ARDUINO_AVR_UNO_WIFI_REV2
  digitalWrite(NINA_RTS, HIGH);
#endif
		//Edit
	  if (_debug) {
	_debug->print("PE ");
	  }
	//
}

My first suspicion was the Neopixel libraries I am using as the code “seemed” to fare better when it wasn’t included. So I began tracking when the NeoPixelBus lib would update and I found that during the while loops it wouldn’t update. Most of the time the pixels aren’t updated during the while loop, however, on occasion, the while loop will activate and read data but not trigger an HCI Event and the Neopixel update is triggered. Furthermore, once I started tracking the _recvIndex I noticed that there are occasions of outbounds/overflow cases where the _recvIndex goes above the 258 length of _recvBuffer. I am not sure if this is caused by the use of the NeoPixelBus or similar library but I suspect that this is the cause of the crash. I have no idea how this happens but I am assuming that somehow it passes into one the if criteria such as:

if (_recvBuffer[0] == HCI_ACLDATA_PKT) {

or 

_recvBuffer[0] == HCI_EVENT_PKT

But doesn’t fulfill the subsequent if cases and the hops back out, skipping the _recvIndex = 0. Then _recvIndex++ just stacks up and up and crashes at 533.

So I added a small if statement to catch this:

if(_recvIndex > 257){
	_recvIndex = 0;
}

Is there anything I missed that might be causing this? I have added an extract of my logs showing where the error starts and ends as well as my code for trying to cause the crash. Also, should I raise this on the GitHub?

Thank you for your time.
Matt

Error_Log.txt (246 KB)

Crash_Testing.ino (5.15 KB)

Small Update:

It seems that the overflow/out of bounds error typically occurs if the BLE receives an 0x02 on recvBuffer[0] (HCI_ACLDATA_PKT), I have added an example below:

R: 4 R: 3E R: 16 R: 2 R: 1 R: 4 R: 1 R: C8 R: A R: 85 R: D1 R: 4 R: 3E R: 1B R: 2 R: 1 R: 0 R: 0 R: 15 R: E4 R: 8C R: C0 R: 1A R: 1C R: F HCI EVENT RX <- 043E1602010401C80A85D1043E1B0201000015E48CC01A1C0F 
^====== Last "good" HCI even before overflow

R: 2 R: 1 R: 1A R: B R: FF R: 4C R: 0 R: 9 R: 6 R: 3 R: 5B R: C0 R: A8 R: 1 R: 4B R: B3 R: 4 R: 3E R: C R: 2 R: 1 R: 4 R: 0 R: 15 R: E4 R: 8C R: C0 R: 1A R: 1C R: 0 R: B4 
^==== 0x02 on first byte of received buffer

29/4/20	15:20:46

[ Various Uncaught Events?]

29/4/20	15:20:47
R: 4 R: 3E R: 1A R: 2 R: 1 R: 0 R: 1 R: BF R: 44 R: 77 R: 9 R: 3C R: 72 R: E R: 2 R: 1 R: 1A R: A R: FF R: 4C R: 0 R: 10 R: 5 R: 13 R: 1C R: 5A R: 88 R: 77 R: AD R: 4 R: 3E R: C R: 2 R: 1 R: 4 R: 1 R: BF R: 44 R: 77 R: 9 R: 3C R: 72 R: 0 R: AD 
29/4/20	15:20:48
R: 4 R: 3E R: 1D R: 2 R: 1 R: 0 R: 1 R: C2 R: EE R: B1 R: E R: 62 R: 4D R: 11 R: 2 R: 1 R: 1A R: 2 R: A R: C R: A R: FF R: 4C R: 0 R: 10 R: 5 R: 51 R: 1C R: 63 R: 4 R: 30 R: A5 R: 4 R: 3E R: 2B R: 2 R: 1 R: 3 R: 1 R: B5 R: C R: F2 R: B8 R: 27 R: D5 R: 1F R: 1E R: FF R: 4C R: 0 R: 12 R: 19 R: 0 R: 15 R: 4 R: DE R: 64 R: 9B R: 36 R: C7 R: FF R: 1F R: 34 R: 39 R: F1 R: BC R: 60 R: 66 R: 81 R: 47 R: 78 R: 8B R: 1C R: 2B R: 98 R: 0 R: 0 R: A8 

29/4/20	15:20:57
R: 4 R: 3E R: C R: 2 R: 1 R: 4 R: 1 R: C2 R: EE R: B1 R: E R: 62 R: 4D R: 0 R: A5 

29/4/20	15:21:43
R: 4 R: 3E R: 28 R: 2 R: 1 R: 2 R: 1 R: C8 R: A R: 85 R: 95 R: 4 R: 67 R: 1C R: 3 R: 3 R: 9F R: FE R: 17 R: 16 R: 9F R: FE R: 0 R: 0 R: 0 R: 0 R: 0 R: 0 R: 0 R: 0 R: 0 R: 0 R: 0 R: 0 R: 0 R: 0 R: 0 R: 0 R: 0 R: 0 R: 0 R: 0 R: D3 R: 4 R: 3E R: 16 R: 2 R: 1 R: 4 R: 1 R: C8 R: A R: 85 R: 95 R: 4 R: 67 R: A R: 9 R: FF R: E0 R: 0 R: 1 R: 4F R: CA R: 7E R: 34 R: 20 R: D3 R: 4 R: 3E R: 23 R: 2 R: 1 R: 0 R: 1 R: C7 R: 64 R: 67 R: 20 R: 34 R: 73 R: 17 R: 2 R: 1 R: 6 R: 13 R: FF R: 4C R: 0 R: C 

[ Various Uncaught Events? End]

***Overflow Catch***