NRF24L01+ logjjam

I use two arduino pro mini clone microprocessors to manage my solar hot water system. They communicate back and forth using NRF24L01+ modules and simple code utilizing the RF24.h and SPI headers. The communications is very simple and works perfectly until about 10:00am each morning. when the worker starts to fail to respond to the manager. The failure starts out being occasional but quickly becomes frequent. Eventually after about 15 to 30 minutes I have to restart the worker. and then communications returns to normal. There will be a return of some failures in the afternoon about 4:00pm for about 30 minutes then everything will be OK until the morning. I'm trying to work out why this happens and how to fix it.

There are two nodes, the Manager and the Worker, the manager decides when things should happen and reports on the status of things and worker has various valves and sensors connected to it and carries out the instructions issued by the manager and reports back to the manager. Everything is straight out of the box. All defaults. Both modules issue a begin then open reading and writing pipes and the worker starts listening. The manager sends a 32 byte packet and the worker receives it, stops listening stuffs new information into the received packet and then returns the packet. Meanwhile after sending the original packet the manager starts listening and uses loop of: while (!radio.available() && ! timeout) if (millis() - started_waiting_at > 500 ) / timeout = true; This is very simple but there are some issues. The worker and the manager are separated by about 20 feet with the manager inside the house. The worker is on a roof and gets very hot, its the Australian summer here so the problem could be heat related. The roof is corrugated iron so there will be reflections and we are in the city so there could be interference. I've built a scanner and virtually discounted wifi interference. The transmissions are spaced at 1500ms so there should e no load problems. But it looks like a buffer is filled and eventually overflows, stopping all further communications. But I've read all of the forums and it seems strange that it happens each day when the sky is clear but the sun is still to reach its zenith. Resetting solves the problem and although the hottest part of the day is still to come the problems don't return until the next morning.

Any ideas?

Usual idea - post your code.

Is your system organized so one of the devices is in charge of communication so that nothing happens unless it requests it?

…R

Thanks Robin,

The worker, which I think is the problem, can only respond. It listens and responds but there is absolutely no checking that the message it responds to came from the manager (very sloppy). If you're suggesting that another device could be involved then it would only need to create a message on the same channel. I can quite easily put a signature code into the manager's message and if the worker ever gets a message without the code then signal back alerting the manager and I will see it. There is another way and that is simply to change the channel. If there is another device then it will not know I've made the change and there should be no transmission problems, and I'll see the Cuckoo on my scanner program. It is certainly worth a try and I'll do it, but it will take a couple of days. (However, I doubt an another device would create the symptoms I'm experiencing).

(There's a lot of code, much to do with traces and System.print etc. I'll try to tidy it up and post it also in a few days). Thanks.

I’d be inclined to try & eliminate the radios from the equation for testing - can you get the manager device close enough to run a hard wired connection between them?

What frequency do wireless phones use in AU? IIRC, 2.4 GHz is one that’s been used in the States - is someone or something in the habit of making calls at 10:00 and 4:00?

The timing piece reminds me of a story about a laser connection between two buildings that failed at the same time each day - the cause turned out to be someone that took a smoke break on the roof at a regular time and left a door open that blocked the signal. Not the same thing here, but maybe there’s a human factors answer.

Tokoh: The worker, which I think is the problem, can only respond. It listens and responds but there is absolutely no checking that the message it responds to came from the manager (very sloppy).

I think you have misunderstood. It had not crossed my mind that it might be responding to another device.

My concern was that the worker would send data when the manager was not ready. However if it only responds to a message from the manager you have already got that covered.

I have a system for wireless control of model trains. The master sends a signal with an ID number for a locomotive. All the locos receive this but only the one with the ID responds. The response (the loco's battery voltage) is already prepared before the message from the master arrives so the first thing the slave does (after figuring out that the message is intended for it) is send the response to the Master. That way the master knows the response will come very quickly and it uses a short timeout to move on if it does not receive the response. Maybe some of these ideas will be useful, and maybe not.

...R

I'm not certain yet but the problem seems to be a timing one but not directly with the NRF24L01. I am transmitting a temperature reading from a ds18b20 dallas probe. This probe takes temperatures and when asked uses analog to digital conversion to obtain the temperature wit precision. This process can take up to 750ms, and I had allowed 800ms but maybe its not quite enough. When my solar system is pumping there are rapid temperature changes so virtual every request at this time requires a conversion whereas when the system is getting up to temperature or finished heating for the day the changes are more gentle.

As I said we're not sure its solved yet but indication are promising. More later. To Robin thanks you got me thinking outside the box.

The communications is very simple and works perfectly until about 10:00am each morning. when the worker starts to fail to respond to the manager. The failure starts out being occasional but quickly becomes frequent. Eventually after about 15 to 30 minutes I have to restart the worker. and then communications returns to normal. There will be a return of some failures in the afternoon about 4:00pm for about 30 minutes then everything will be OK until the morning.

Your neighbour does something at 10 am (maybe leaves for work, opening the garage door) and returns at 4 pm (opening it again)?

Tokoh: I am transmitting a temperature reading from a ds18b20 dallas probe. This probe takes temperatures and when asked uses analog to digital conversion to obtain the temperature wit precision. This process can take up to 750ms,

I still don't have a clear idea how the work is organized between the two Arduinos.

If you need more advice maybe you could expand on the description.

Have you considered "reversing" the temperature collection process. In other words as soon as the data has been sent to the master the worker will start another temperature collection (or perhaps continuously collect measurements but only keep the most recent) and then when the master asks for another reading it is ready to be returned instantly.

...R