The problem
The receiving side seems to be not receiving some incoming messages that were reported as verified by auto ack on the sending side.
Do you have any suggestions as to how this situation might be possible at all, considering the logic I have implemented, and how to fix the issue?
Hardware setup
I have Arduino Nano as a sender and Teensy LC as a receiver.
Both RF24 boards (very likely, fakes) have 10uF capacitors soldered directly onto their GND and VCC pins.
Nano is powered from 2 AA batteries using two boost converters for 4.6 (for Nano) and 3.3 volts (for RF24). I had to turn the voltage down from 5V to 4.6V because at 5V it lost much more packets; maybe because of 3V3 -> 5V signal differences (I guess, I'll have to try also a level converter, even if rf24 is told to tolerate 5V input signals well). In the final design, I intend to use Arduino Pro Mini 3V version to avoid signal level inconsistencies.
Teensy LC is powered by USB, and its RF24 is powered from Teensy's 3V3 pin (should be enough, since that provides up to 100mA, and RF24 shouldn't consume more than that, judging by specs).
Programming
I'm using TMRh20 RF24 library, version 1.4.1. I did the first tests using the basic test examples I found mentioned here on the Arduino forums. Everything worked just fine, and I proceeded with more in-depth stress-testing and adjustments for my project. I haven't yet connected any other sensors, first I'm just trying to get RF24 to work reliably.
The general approach is as follows.
For my project, if a packet failed to be delivered soon enough, I might have some new data from sensors to deliver instead. So, it is better to give up on the old message and just try to send a new one instead. So, for the sending side, I set retries to pretty low-ish values of (2, 2). If the last attempt to write() a message failed and there is no new data yet, then I'll be sending the same old message the next time. The receiving side will detect the duplicates and discard them.
I'm sending dummy generated random numbers as my data packets, and also adding last ACK time (ms) in a byte for diagnostics.
Data rate is RF24_1MBPS and the power is RF24_PA_MIN (because both devices are in the same room).
Every message has the last byte with sequence number looping from 0 to 255. The receiver checks if the difference between two seq numbers has jumped by more than 1, in which case it logs the previous and the current packet to the serial output. I'll know I'll miss the edge case for lost packets in the moments when seqnum wraps around from 255 to 0, but this condition won't prevent me from detecting most of the lost packets in any case.
So, from the algorithm perspective, it should work rock-solid - RF24 auto-ack mechanism has me covered to ensure the receiver has received the message (if the sender has received an ACK back, it should guarantee the receiver side has actually received the message, right?), and if auto-ack returns a failure, then I'll keep sending the old message myself until I get a success result from write().
But it doesn't work that way. The receiving side still reports some lost messages even in the cases when the sender reports it has write() success!! How is that even possible?
My last test showed 11 lost out of 8000. Still, I expected none to be lost at all. How can the receiver side not see a message that it has sent an ACK for??
Is radio.write sometimes lying to me and reporting true result even when ACK not received?
Or is radio.available or radio.read available sometimes lying to me and not returning the data that was received and ack-ed by RF24?
What am I missing here?
I captured one such case from the serial logs. Pay attention to the last byte, that's a seqnum; the other bytes are random data. The 7th byte is the duration of the previous write() just for stats.
The sender:
Sending: 16 23 39 3 89 165 1 1
Sending: 220 238 174 145 95 51 0 2
Sending: 248 188 58 210 103 231 0 3
Sending: 16 144 57 31 113 245 0 4
Sending: 123 78 26 27 131 64 0 5
Tx failed
Sending: 123 78 26 27 131 64 0 5
Sending: 255 77 236 73 35 57 1 6
The receiver:
Packet loss detected!
Previous payload: 248 188 58 210 103 231 0 3
Current payload: 123 78 26 27 131 64 0 5
So, you can see that the receiver has failed to read the packet with number 4, which did not report any failures on the sender side!
However, it has correctly received the next message that has failed on the sender and was manually resent in the next loop() iteration. So, at least the part with sending duplicates is working reliably.
Below I provide the relevant code fragments from my sender / receiver (I'll shorten the code for brevity, it has some unrelated debugging code for LCD and serial and stats counting).
The sender:
#include <SPI.h>
#include <nRF24L01.h>
#include <RF24.h>
#include <printf.h>
#define RADIO_CE_PIN 9
#define RADIO_CS_PIN 10
const byte PIPE_NAME[5] = {'a','a','a','a','1'};
RF24 radio(RADIO_CE_PIN, RADIO_CS_PIN);
// wrapping-around short sequence number
// to detect missed packets on the receiver side
byte seqNum = 0;
byte lastAckTimeMs = 0;
bool lastSendResult = true;
// 6 bytes of some random data, 1 byte of last ACK time in ms (expected less than 255) and 1 byte with seqnum
byte payload[8];
// just for printing stuff
char buf[100];
void setup() {
Serial.begin(9600);
while (!Serial) {
// some boards (Teensy) need to wait to ensure access to serial over USB
// but it will freeze setup() until serial is opened from the other (e.g. Arduino IDE) side
// so, do not use for production
}
printf_begin(); // needed for radio.printDetails to prevent crashing on low-memory devices
initTransmittingRadio();
}
void loop() {
send();
//delay(1000);
}
// ===============================
void initTransmittingRadio() {
if (!radio.begin()) {
Serial.println(F("Radio hardware is not responding!"));
while (1) {} // hold in infinite loop
}
radio.setDataRate(RF24_1MBPS); // RF24_1MBPS, RF24_2MBPS, RF24_250KBPS
radio.setRetries(2, 2); // delay, count
// 0 means 250us, 15 means 4000us
// small retry count - don't wait too long, might have new data to transmit
// Set the PA Level low because nodes will be close to each other
radio.setPALevel(RF24_PA_MIN); // RF24_PA_MAX is default; RF24_PA_MIN, RF24_PA_LOW
// move to a higher channel to avoid clashes with WiFi, BT etc.
// The range is 2.400 to 2.525 Ghz
// The nRF24L01 channel spacing is 1 Mhz which gives 125 possible channels numbered 0 .. 124.
radio.setChannel(115);
// save on transmission time by setting the radio to only transmit the number of bytes we need.
// max packet is 32 bytes at a time
radio.setPayloadSize(sizeof(payload));
// setPayloadSize in older library versions must come before opening pipes to be effective!
radio.openWritingPipe(PIPE_NAME);
// switch to transmit mode
radio.stopListening();
Serial.println(F("Radio hardware initialized"));
radio.printPrettyDetails();
randomSeed(analogRead(0));
}
void send() {
// if radio.write failed (missing ack?), keep the old payload;
// a radio.write failure doesn't necessarily mean the receiver did not get the message,
// in which case we'll cause duplicates for the receiver to filter out
if (lastSendResult) {
prepareNextPayload();
}
sprintf(buf, "Sending: %u %u %u %u %u %u %u %u", payload[0], payload[1],
payload[2], payload[3], payload[4], payload[5], payload[6], payload[7]);
Serial.println(buf);
unsigned long startTime = millis();
lastSendResult = radio.write(&payload, sizeof(payload));
// write will block if setRetries enabled
unsigned long currentTime = millis();
lastAckTimeMs = currentTime - startTime;
if (!lastSendResult) {
Serial.println(F("Tx failed"));
}
}
void prepareNextPayload() {
// random bytes, seqnum last
for (byte i = 0; i<6;i++) {
payload[i] = random(256);
}
payload[6] = lastAckTimeMs;
payload[7] = seqNum;
// increase the counter for the next time; will wrap around 255
seqNum++;
//seqNum++; // to trigger fake loss detection on the receiver
}
The receiver:
#include <SPI.h>
#include <nRF24L01.h>
#include <RF24.h>
#include <printf.h>
#define RADIO_CE_PIN 9
#define RADIO_CS_PIN 10
RF24 radio(RADIO_CE_PIN, RADIO_CS_PIN);
const byte PIPE_NAME[5] = {'a','a','a','a','1'};
// stats
unsigned long packetsReceived = 0;
unsigned long packetsLost = 0;
unsigned long duplicatesDetected = 0;
bool wasRetransmitted = false;
byte lastAckTimeMs = 0;
// just for printing stuff
char buf[100];
byte payload[8];
byte previousPayload[8];
byte lastSeqNum = 0;
// to avoid lost packet false positives on start
bool freshRun = true;
// to print stats for batches, not every message
const unsigned int LOG_EVERY_N = 100;
unsigned int collectedSinceLast = 0;
void setup() {
Serial.begin(9600);
while (!Serial) {
// some boards (Teensy) need to wait to ensure access to serial over USB
// but it will freeze setup() until serial is opened from the other (e.g. Arduino IDE) side
// so, do not use for production
}
printf_begin(); // needed for radio.printDetails to prevent crashing on low-memory devices
initReceivingRadio();
}
void loop() {
getData();
serialPrintData();
}
// ===============================
void initReceivingRadio() {
if (!radio.begin()) {
#ifdef WITH_SERIAL_DBG
Serial.println(F("Radio hardware is not responding!"));
#endif
while (1) {} // hold in infinite loop
}
radio.setDataRate(RF24_1MBPS); // RF24_1MBPS, RF24_2MBPS, RF24_250KBPS
// default - true
radio.setAutoAck(true);
// Set the PA Level low because nodes will be close to each other;
// receiver side transmits ack messages
radio.setPALevel(RF24_PA_MIN); // RF24_PA_MAX is default; RF24_PA_MIN, RF24_PA_LOW
// move to a higher channel to avoid clashes with WiFi, BT etc.
// The range is 2.400 to 2.525 Ghz
// The nRF24L01 channel spacing is 1 Mhz which gives 125 possible channels numbered 0 .. 124.
radio.setChannel(115);
radio.setPayloadSize(sizeof(payload));
// setPayloadSize in older library versions must come before opening pipes to be effective!
radio.openReadingPipe(1, PIPE_NAME);
// switch to receive mode
radio.startListening();
Serial.println(F("Radio hardware initialized"));
radio.printPrettyDetails();
}
void getData() {
// reset duplicate counter
wasRetransmitted = false;
// check if have something unread yet in the receiving pipe buffer
if (radio.available()) {
// keep old payload for comparison when losing packets
memcpy(&previousPayload, &payload, sizeof(payload));
radio.read(&payload, sizeof(payload));
collectStats();
}
}
void collectStats() {
packetsReceived++;
collectedSinceLast++;
byte currSeqNum = payload[7];
lastAckTimeMs = payload[6];
if (!freshRun) {
int seqDiff = currSeqNum - lastSeqNum;
// if last received seq num differs by more than 1
// then consider that we had missed some packets in between
// (or they arrived in wrong order? shouldn't happen, we don't send new until ACK)
// of course, we miss the edge case for 255 -> 0 wrap-around, but not a big deal
if (seqDiff > 1) {
Serial.println(F("Packet loss detected!"));
sprintf(buf, "Previous payload: %u %u %u %u %u %u %u %u", previousPayload[0], previousPayload[1],
previousPayload[2], previousPayload[3], previousPayload[4], previousPayload[5], previousPayload[6], previousPayload[7]);
Serial.println(buf);
sprintf(buf, "Current payload: %u %u %u %u %u %u %u %u", payload[0], payload[1], payload[2],
payload[3], payload[4], payload[5], payload[6], payload[7]);
Serial.println(buf);
packetsLost++;
} else {
if (seqDiff == 0) {
wasRetransmitted = true;
duplicatesDetected++;
}
}
}
lastSeqNum = currSeqNum;
freshRun = false;
}
void serialPrintData() {
if (wasRetransmitted) {
Serial.println(F("Detected a possible retransmitted message with repeating sequence number"));
}
if (collectedSinceLast == LOG_EVERY_N) {
sprintf(buf, "Received: %lu; lost: %lu; duplicates: %lu; last ACK time: %u",
packetsReceived, packetsLost, duplicatesDetected, lastAckTimeMs);
Serial.println(buf);
collectedSinceLast = 0;
}
}