1000 Arduinos cluster networking

Hi all,
I'm working on a project where I need to build small units with sensors, LEDs, motors and Arduino to control them. All these box shaped units will be placed as a 2D array of 100x10. There will be probably around a thousand of these units controlled by a main computer.

I'm planning on using I2C to control a 10x10 grid of these, which will be about 1.5-2 meters in width and height. A master Arduino will receive data from computer and will forward data to the 100 slaves in its grid. There is no need for data to be sent from slaves to master. I'll use XBee or RS485 to send data from computer to all the masters (about 10).

The problem I'm facing is whether I2C will be able to handle all the connections since the wire capacitance shouldn't exceed 400pF. I'm thinking of placing master in the center of grid and all slaves connected using telephone cables in a star shape. So, the maximum distance between master and slave will be 2m for any of the slave. But I read someplace that the capacitance of the whole system is considered, which will far exceed 400pF. Can someone please help clarify that?

If anyone knows of any other cheap but robust(albeit difficult) way of handling such a large network, please point me in the right direction... Speed is not very important in this case.

You could have several independent I2C busses if you use one or more I2C multiplexer chips.

Pete

I've never done anything that size but my gut feeling is that you will blow the capacitance limit by a mile. Have a look at I2C buffer chips like the PCA9600, these can drive 10x the standard limit, but you may need one on every node as well.

You can also split the bus into several with a MUX like the PCA9646 or PCA9547, MUXs don't usually buffer though so I think you would still need buffer chips.


Rob

I2C will cause only more trouble.

Do you want to connect all Slaves to the Master ? And only in the direction of the Master to the Slave ?
Using wireless would solve many problems, but it will be slow before all the slave have received their message. For a higher wireless data rate, you need something like the nRF2401 modules (there are many more, XBee, RFM22, and so on). It's only a short distance, so that will work well.

The ZigBee can be used with distributed network. I have no experience with that, but it is worth looking into.

You could use a Serial connection. Perhaps with the SoftwareSerial. You could also chain-link the Slaves. So a Slave passes on the data, until the data reaches the destination Slave.

You could use a Serial connection...You could also chain-link the Slaves

Yes, simple async serial would be easy, either multi-drop or daisy-chained either with TTL or maybe RS-485 buffers.

This does however require an intelligent slave, if you use I2C you can get by with dumb IO expanders or proper LED drivers.


Rob

Thanks for the suggestions.

I was trying to not use the I2C range extender ICs to reduce cost but it seems I'll have no way out. Multiplexer is a great idea! If I'm understanding correctly, it will let me use a single master? The computer will send data to master Arduino connected to MUX with each multiplexer line connected to 100 Arduinos across a buffer? Something like this:
Computer -> Master Arduino -> MUX -> I2C buffer -> cable -> I2C buffer -> Slave Arduino

I'm assuming multi-drop is same as a star topology, then won't multi-drop UART face the same problem of wire capacitance? I'm guessing daisy chaining UART or RS-485 some 1000 Arduinos will probably cause latency in order of seconds, and a huge possibility of data corruption...

Also, is it possible to multi-drop RS-485? I read here that RS-485 only supports max 32 receivers: Your Go-To Source for Innovative Solutions - NI
Is that is at protocol level or hardware?

Multiplexer is a great idea! If I'm understanding correctly, it will let me use a single master?

Yes, I've only seen 8:1 MUXes so you will still need a few of them.

Arduino connected to MUX with each multiplexer line connected to 100 Arduinos across a buffer?

That's the theory, but I cannot say that all the wire and 100 slaves will not exceed the capacitance, and I don't know how to find that out without trying it. Still you don't have to have 100 slaves on each line.

Another issue is how to select 100 busses, the PCA9547 I mentioned only has 3 address lines, so without further address decoding you can only have 8 of them. That said the "further decoding" will just be a single 74xx138 I think. So with a single 138 you can address 8 MUXes each with 8 I2C busses, that's 64 busses or ~15 nodes per bus, well within capacitance range I would think. (EDIT: will need more than a single 138)

Computer -> Master Arduino -> MUX -> I2C buffer -> cable -> I2C buffer -> Slave Arduino

That looks right, but of course several MUXes.

I'm assuming multi-drop is same as a star topology

Not normally but it can. all it refers to us the fact that all nodes connect to the same wires, this is normally along a single cable but subject to various factors it can be a star.

then won't multi-drop UART face the same problem of wire capacitance?

No, the difference is that I2C is open-collector, there is no active circuitry driving the line high, just a pullup resistor. Whereas UART comms has both active high and low levels.

Daisy-chaning of course eliminates all these sorts of issues, but might create another

I'm guessing daisy chaining UART or RS-485 some 1000 Arduinos will probably cause latency in order of seconds

That depends on the baud rate, depending on how it's done you can have as little as a one-byte delay per node, that's 10 bits. At 115200 that's 87uS per node or 87mS from node 0 to node 999. Let's say 100mS, that's a refresh rate of 10Hz.

EDIT: Oops, out by a factor of 10, figures right now I think.

There would I think be a ripple effect with this technique, and you may have to organise some sort of syncing so all the nodes change at the same time.

EDIT: That's for a single byte, your protocol would probably have 2-3 at least, so multiple times by that.

and a huge possibility of data corruption...

I see no reason to assume there will be corruption any more than the other options.

is it possible to multi-drop RS-485? I read here that RS-485 only supports max 32 receivers

RS-485 allows 32 "Unit Loads" (ULs) and in the early days that did indeed means 32 nodes. However these days 1/4 and 1/8 UL chips are available so 256 nodes is achievable.

Is that is at protocol level or hardware?

There is no RS-485 protocol, it's strictly a hardware definition. The protocol is up to you.

Now, having written all that I just reread the first post and you say 100 not 1000...or maybe not

2D array of 100x10. There will be probably around a thousand of these units controlled by a main computer.

I'm planning on using I2C to control a 10x10 grid of these, which will be about 1.5-2 meters in width and height. A master Arduino will receive data from computer and will forward data to the 100 slaves in its grid.

Can you draw a tree showing the hierarchy and the distances involved? I'm confused as to how many Arduinos are controlling how many LEDs/sensors/whatevers. This need to be documented well because the topology will probably make a big difference to how it's done.


Rob

What about wireless ? You didn't comment on that.
It can be cheap. Using multiplexers and wires cost also money.
The NRF24L01 modules cost less than a dollar $)

Thank you Rob for that extremely detailed explanation! Thanks for all your help :slight_smile:

Can you draw a tree showing the hierarchy and the distances involved? I'm confused as to how many Arduinos are controlling how many LEDs/sensors/whatevers. This need to be documented well because the topology will probably make a big difference to how it's done.

I've attached a very rough sketch in MS-Paint of what I'm planning. The smallest square is a unit having Arduino, I2c buffer and sensors. There are 1000 of these (100x10). I've divided them into manageable grids of 100 (10x10). A unit is 4"x4" in dimension.

Yes, I've only seen 8:1 MUXes so you will still need a few of them.

I found this 16:1 MUX. So I think one of these could do the job.

I'll select one of the 10 lines selected by MUX using the master Arduino, i.e. Arduino will control the 4 selection pins. On the line, I'll be sending SDA and share SCL with all units. The black boxes will be splitting the single wire to 100 for all the units.

That depends on the baud rate, depending on how it's done you can have as little as a one-byte delay per node, that's 10 bits. At 115200 that's 87uS per node or 87mS from node 0 to node 999. Let's say 100mS, that's a refresh rate of 10Hz.

Wow, that's some tricky calculation. The delay seems perfectly acceptable. I'll consider this option as well...

What about wireless ? You didn't comment on that.
It can be cheap. Using multiplexers and wires cost also money.
The NRF24L01 modules cost less than a dollar smiley-money

I tried looking for these below 1 dollar. They look promising, but could only find them for $3. If you have link for 1$ parts, please share... :slight_smile:

On Ebay, inclusive shipping. Buy 10pcs to get below 1 dollar.

They look impressive. The cost for 1000 NRF24L01+ will be manageable as it comes down approximately the same as the cost of wiring and I2C buffers. The only problem I see could be with range and data accuracy. I've tried using these 434MHz transmitter/receivers(https://www.sparkfun.com/products/10532) earlier and the experience left me wary of using wireless again. Have you worked with NRF24L01? How much noise is there and how dependable are they?

They are so different, you can't compare those. It's a different frequency, different protocol, different signal quality, different everything.
I have not used the NRF24L01+ yet (although I have a few).
The range depends on the antenna. Sparkfun has a board with a chip antenna for 100 meters range, but for short distance those from Ebay should be good enough.

This project is with those pcb copper antenna from Ebay, and is 40 meters range:

Here is another test, and the result is 30 to 50 meters:
https://hallard.me/nrf24l01-real-life-range-test/

I think the 'mirf' library is used with the NRF24L01+ : Arduino Playground - Nrf24L01
The NRF24L01+ needs 3.3V, but the SPI signals are 5V tolerant.

Ignoring the communications, have you thought about power distribution? 1000 arduinos with even minimal IO is going to be about 100A at 5-12V, much much more if you're running motors+lights at each node. Distributing that alone is hard enough unless you're going to be quite clever. The need for power wires means that wireless comms isn't really an option. Wireless is great until you send a broadcast packet to 1000 nodes and then they all send an acknowledgement, none of which get through due to the collision, and then the master retries, and then it all just goes to custard. Don't even think about wireless until you understand the bitrate/area limitations on wireless protocols, how they interfere and how they're arbitrated.

You probably want to think about putting in a buck converter (eg an LM2596 with all supporting components can assembled on a PCB for $1; integrate it into your PCB design) for each group of 10 or so nodes. Distribute power at the top level at about 36V to minimise ohmic losses in cable (and therefore avoid buying expensive fat copper cables - you don't want to be paying to run a kilometre of 8GA) and downconvert to 7V (to Vin) for each strip of 10 nodes. If you have power-hungry devices that take 12V, then you need both a 5-7V and a 12V converter for each node group. You could even have a converter per node if they are particularly power-hungry.

Multi-drop serial is a good approach to comms because you can bundle power and data into a single cable, like the multicore stuff used to wire up sprinkler systems and home alarms. Since you are pushing data outwards only, it's a much more-tractable problem.

I would suggest you look at CAN bus. Don't bother with the data-link layer (the complex bit which requires special controller silicon), just use the transceivers. That's possible because your data is one-way (master broadcasting) so you never need to worry about collisions. You can get transceivers (eg MCP2551) for under $1 each and they will allow you to connect 100 devices on a single electrical bus extremely reliably, with data rates up to 1Mbps. You could run any one-way serial protocol over the CAN physical layer, including RS232 and therefore make use of the built-in UARTs. Since it's one-way, you can chain and branch buses as much as you want by using a handful of transceiver devices to take data from one bus and send it onto multiple other buses.

How sure are you about never needing slave-to-master comms? Really sure? Totally sure? Bet the $50,000+ that this project is going to cost on that guess? If you use CAN bus transceivers and leave an empty footprint on your custom PCBs for a CAN controller then you could probably populate that footprint later and get reliable bi-directional communication. That would mean not using simple transceivers as bus bridges though, you would need a protocol-level bridge instead - basically an arduino with multiple CAN controllers on it.

Since this is somewhat physically distributed, have you thought about EMI and reliability? What happens when you start getting interference induced in your data and power lines? Do you need isolated comms?

There are lots of other issues, like "who is the poor bastard is who going to plug all the wires in?". "Who is going to manufacture?" At a scale of 1000, I would strongly suggest that you need to make a custom PCB for each slave node, test it heavily and then get a chinese contract-manufacturer to churn out the quantity; those sorts of services start at about 50 devices built. You don't want to be buying arduinos and plugging shields into them, you want a single small board with EVERYTHING on it, exactly to your requirements. Even once manufacturing is solved, you don't want to be hand-soldering 4 or 6 conductors at each end of 1000 cables. You want to pick a single connector that carries everything you need and which crimps/assembles very quickly with no soldering. Again, contract out the cable manufacture.

As a matter of curiosity, who is funding such a huge-scale thing? What does it do?

Definate +1 to the CAN trancievers idea. They are really great for UART. It also gives you transmit functionality for free, which might be nice for debugging/diagnostics/self test.

There is no reason you actually need a CAN controller to get bidirectional data transmission with CAN transceivers. The collision detection mechanism should work just as well to detect UART collisions as it does to detect CAN collisions, so you can always implement your own protocol. I've had a lot of success with this general approach, and I'm actually working on a board design for sensor nodes with this technique. It's just tricky to get the programming right, as serial protocol stuff often is.

And besides, you have a "master node" and you can always avoid collisions in the traditional master/slave way, by making the slave devices not send anything except when the master tells them to.

Be really sure that you are actually able to push the amount of data that you want. If you are running these in groups of 100 each, changing the master of each group to a raspberry pi, a more powerful arduino like the due, or whatever else you need to make it work would hardly make a dent at all in budget.

It would probably be easiest to give each 10x10 section it's own power supply running directly from mains, to avoid having to run high amperage low voltage DC cables which can be a fire hazard and need thick wires even at 36v, but I don't know how available mains power will be, what your local extension cord regulations, if you have an electrician handy, etc.

Thanks polyglot for your extremely detailed advice... X :smiley:

polyglot:
As a matter of curiosity, who is funding such a huge-scale thing? What does it do?

This is a split-flap interactive installation project, something like this:
http://unknowndomain.co.uk/2012/02/10/stanford-university-graduate-school-of-business/

The latest design decided uses RS-485 with 10 split-flap units connected in a daisy chain ending with a terminator. A 1-to-10 splitter will divide the incoming signal into 10, each going into a daisy chain. A rough sketch is attached. Left side shows the networking schema, right one shows how each unit's PCB is connected in a daisy chain. The number of units is reduced to 800.

I'm still afraid my 1-to-10 splitter box is creating a star topology(not good for RS-485) and signal reflections might wreck havoc. The only way out I see is connecting 100 units together in a single daisy chain.

polyglot:
Ignoring the communications, have you thought about power distribution? 1000 arduinos with even minimal IO is going to be about 100A at 5-12V, much much more if you're running motors+lights at each node.

Earlier when using 1-to-100 approach, I was sure both stepper motor taking 400mA per phase and PCB could be supplied power seperately over 4 lines of CAT5 cable without any problem. Since RS-485 doesn't like star topology, the design has been changed to daisy chain which means I can't send power for 10 steppers through cat5 cable. So since each motor takes 9.6 Watt(0.8A x 12V) at a full step, there will be a 600W PSU for every 50 split flaps which will be connected in a daisy-chain-ish parallel connection. Ditto for PCB with AVR and MAX1483 which will get its own 300W PSU every 50 units(will first measure current used by test PCB and then buy PSU). There will be 4-core 16AWG or thicker power cable for wiring them up.

polyglot:
You probably want to think about putting in a buck converter (eg an LM2596 with all supporting components can assembled on a PCB for $1; integrate it into your PCB design) for each group of 10 or so nodes. Distribute power at the top level at about 36V to minimise ohmic losses in cable (and therefore avoid buying expensive fat copper cables - you don't want to be paying to run a kilometre of 8GA) and downconvert to 7V (to Vin) for each strip of 10 nodes. If you have power-hungry devices that take 12V, then you need both a 5-7V and a 12V converter for each node group. You could even have a converter per node if they are particularly power-hungry.

Yep! Exactly what I thought too. For the 12V stepper motors, I'm unsure if I need to supply 12V directly(cheaper) or supply higher voltage and have it regulated(costly). The PCB containing AVR will also have LM7805 with caps to convert its seperate 12V to 5V which will also help reduce noise in the digital circuitry.

polyglot:
Multi-drop serial is a good approach to comms because you can bundle power and data into a single cable, like the multicore stuff used to wire up sprinkler systems and home alarms. Since you are pushing data outwards only, it's a much more-tractable problem.

I'm not sure I understand multi-drop topology. Google and wikipedia gives hazy information on vending machines and no topology map images. It seems similar to star network with the master controlling all network to slaves?

polyglot:
I would suggest you look at CAN bus. Don't bother with the data-link layer (the complex bit which requires special controller silicon), just use the transceivers.

CAN sounds good, but comapred to RS-485, is it better? I googled for comparisons between the two and everywhere people seem to prefer RS-485 due to wider use, more support and easier implementation. If there are any benefits over RS-485 in my particular case, I'll be happy to switch over to CAN. But I'd prefer to follow the KISS principle.

polyglot:
How sure are you about never needing slave-to-master comms? Really sure? Totally sure? Bet the $50,000+ that this project is going to cost on that guess?

I'm pretty sure. Yeah, really sure. But won't make that bet XD
Master only need to address a slave and tell it which flap to flip to. I don't think we need any response for that. But to be on the safe side, if going with RS-485, I could use MAX1482 full duplex IC(I have enough unused wires in the CAT5 anyways), else use your nifty idea of leaving space on PCB for later addition.

polyglot:
There are lots of other issues, like "who is the poor bastard is who going to plug all the wires in?".

That'd be me :stuck_out_tongue:

polyglot:
"Who is going to manufacture?"

There are a few PCB manufacturers who are willing to do it at a good quality and price.

polyglot:
you don't want to be hand-soldering 4 or 6 conductors at each end of 1000 cables. You want to pick a single connector that carries everything you need and which crimps/assembles very quickly with no soldering. Again, contract out the cable manufacture.

Will be using CAT5 cables. RJ45 jacks crimping will be done ourselves according to distance between units(which might vary). Only 2000 jacks to crimp :sweat_smile:

EternityForest:
There is no reason you actually need a CAN controller to get bidirectional data transmission with CAN transceivers. The collision detection mechanism should work just as well to detect UART collisions as it does to detect CAN collisions, so you can always implement your own protocol. I've had a lot of success with this general approach, and I'm actually working on a board design for sensor nodes with this technique. It's just tricky to get the programming right, as serial protocol stuff often is.

Without controller, isn't that a bit like RS-485? I asked polyglot the same question: what will be the benefit over RS-485? Adding a single extra component to the unit means a substantial increase in the whole cost.

EternityForest:
Be really sure that you are actually able to push the amount of data that you want. If you are running these in groups of 100 each, changing the master of each group to a raspberry pi, a more powerful arduino like the due, or whatever else you need to make it work would hardly make a dent at all in budget.

Agreed. That was the initial design, to use a master arduino for each 100 slaves and a super-master arduino to rule em all. ]:slight_smile:
But using a simple MUX in the design suggested by Graynomad in the starting of this thread meant that I could remove them. Besides, I'd have to implement another network between the super-master and masters, so that's one less headache.

EternityForest:
It would probably be easiest to give each 10x10 section it's own power supply running directly from mains, to avoid having to run high amperage low voltage DC cables which can be a fire hazard and need thick wires even at 36v, but I don't know how available mains power will be, what your local extension cord regulations, if you have an electrician handy, etc.

As explained in my previous reply to polyglot, I'll be using seperate power for steppers and PCB(AVR, transceiver, LEDs, optosensors) for every 10x5 grid. Using PSUs according to 10x10 grid power consumption are costlier than smaller ones for 10x5 grid.

-Antzy

I'm not sure I understand multi-drop topology.

From the looks of the diagram you are in fact using multidrop already. MD is just all nodes connected in parallel, with RS-485 that means all A signals connected together and all B signals connected together.

The "splitter box" is just a mechanical splitter I guess, no active circuitry. So it seems that you have essentially a star network with 10 arms. AFAIK RS-485 is ok with star but you will have reduce the data speed so the reflections are not a problem.


Rob

With proper terminations and good connectors, RS485 speed shouldn't be impacted.

If you don't use a proper CAN controller, there really isn't an advantage to CAN over RS485 unless you implement your own protocol that takes advantage of the collision detection abilities of CAN.

However(I'm not sure if this is still true, I haven't looked up RS485 transceivers recently), CAN transceivers can be cheaper than rs485 transceivers, and they sometimes have better fault protection than RS485. The CAN spec only requires -2 to +7 common mode tolerance, but almost all transceivers do much better than this.

I think the mcp2561 is about 98 cents in single quantities and probably much less in thousands.

I think there was actually an app note about rs485 in a branching topology. I seem to remember the advice being to run as slow as possible and keep runs as short as possible, and that it was definitely possible and decently reliable.

This thread is going after >127 networked devices.
I posted some RS485 app notes there, they may come in handy here.
http://forum.arduino.cc/index.php?topic=229569.new;topicseen#new