Uploading code to multiple microcontrollers simultaniously

Hi everyone

I am looking into a project which will require a way of uploading code to an arbitrary number of microcontrollers.

Ideally I would want to upload the code simultaneously.

EDIT: This is not a 'one shot' reprogramming. I need to be able to reprogram them frequently, so pre-ordering them programmed is not an option. Also the number of micro-controllers is (as stated) arbitrary (between 6 and ~144) so multiplexing them is not practical (though possible).

At this point there are basically no constraints yet - nothing has been decided upon yet, so I could be using any type of controller; I could potentially even write my own bootloader. Having said that, ideally I would want to work with the atmega328p and the Arduino bootloader.

I have read that simply wiring them in parallel is prone to errors if they do not synchronize properly. Has anyone actually tried this?

How would you go about doing that? I'm open for any suggestions (mechanical/electrical/software solutions etc.).

Regards

p.

There was a thread a while back where the poster wired up multiple parts in parallel for programming, and it worked. Might have turned off the verify feature so they didn't read back, or the Tx line back to the USB/Serial adapter was only connected for 1 part.
Not sure what to browse for to find that.
You could try similar. Data out from USB/Serial to many parts, only 1 TX back t the adapter. Probably need to buffer the USB/Serial output to maintain signal integrity and overcome cabling capacitance.

This is not the one ? How to program multiple Arduino Nanos simultaneously - Microcontrollers - Arduino Forum
Is it this ? Programming multiple ATMega's/arduino's at once - Development - Arduino Forum

How far apart are the chips? All on in the same box or spread out over a factory?

Unless in the same box (and maybe even then) I would provide off-line storage (say a small EEPROM/Flash chip) on each node and have the program read new code, check for errors, then save in the EEPROM. Then when all is downloaded and confirmed get each node to program itself from that EEPROM with a custom bootloader.


Rob

Ideally I would want to upload the code simultaneously.

I'm just curious. How it that supposed to work if the uC can only send to one receiver at a time. Even if you used several serial ports like on a Mega, you can only be sending to one at any given time , for obvious reasons. The processor is executing one instruction at a time.
If that is an instruction is to send a character to one receiver, were you planning to send one character to each receiver in some predetermined sequence like dealing cards to multiple players ? I'm just not getting it.

There was a thread a while back where the poster wired up multiple parts in parallel for programming, and it worked

This, on the other hand , makes perfect sense because there is no limit (well, I there is but there are ways around it) to how many parallel ports can read the same byte simultaneously. The catch is you then have to use UARTS to convert the parallel to serial. Each byte can be sent to all receivers , converted to serial and then sent as serial. What I am not getting is how do you send a command that tells the arduino to begin loading the flash memory with the sketch. There must be more to it than just sending the program because the compiler converts it to binary before sending it. It's an interesting problem and I would like to learn how it is done. At the moment I don't have a clue how you send binary on an RS485 com port. Is there something your not telling us ?

@everyone - thanks for your input. as often is the case, I probably should have provided a bit more context to start out with:

I would like to design a series of 'processor nodes' which can execute parallel algorithms. The reason to have physical elements is in order to have a fast and intuitive way of exploring interactions between the algorithm chosen and the topology of the nodes.

e.g. "it works when arranged in a square, what happens if I arrange them in a triangle shape?" imagine being able to prune a neural network by physically removing nodes and seeing the effect of the pruning in real time...

The goal is to create a tool which helps visualize and better understand parallel algorithms.

In the end, I do not need to actually 'reprogram' the individual nodes. Each node will be continuously executing a function - in order to do the type of rapid iterations I have in mind, I need students to be able to tinker with that function. Changing details of the code has to be effortless enough for this to retain some element of fun and exploration.

@raschemmel

you are taking me a bit too literal. I do not necessarily mean 'program them all at once'. I mean 'program them all in a single step':
I want to be able to edit my code and then click 'upload' without having to repeat the process for each and every node.
having said that, simply by connecting the signal lines, there is no reason not to talk to multiple nodes simultaneously (I have done this both intentionally and by accident in the past by writing super sloppy i2c code using the wiring library)

@graynomad
the nodes would be 'all in the same box'.

you are suggesting that I use a separate eeprom chip for storing code? would that not simply defer my problem? i.e. instead of talking with an arbitrary number of controllers I now communicate with an arbitrary numer of eeprom chips? Would you mind elaborating on your suggestion? I am not sure I fully understand it.

Thanks PeterN for digging up those threads and Crossroads for the suggestion. I had seen those earlyer, but they feel a bit too much like a 'hack' to me. I am hoping to hand over this plattform to software guys and I am hoping for the platform to be stable enough, so whoever is using this to experiment can focus on the algorithms. I will try doing it this way, but somehow that feels like a solution prone to cause problems somewhere further down the line.

Right now I am tending towards the idea of not actually reprogramming them, but rather changing parameters via i2c. But I am curious in other solutions - maybe there are other microcontrollers which would be better at this thing, or maybe I am not thinking about this in the right way. I don't have any formal training in any of this, so sometimes I miss the obvious.

Thanks & Regards

P.

fkeel:
maybe there are other microcontrollers which would be better at this thing, or maybe I am not thinking about this in the right way.

Hey, a microcontroller is - a microcontroller. The features of the Atmel are pretty average. No microcontroller is going to be designed specifically to suit your idiosyncratic requirement, it is primarily a software matter.

You misunderstood the "same box" question. I2C is not suited for communication beyond a metre or so - if that - and you still did not address that question from which all further advice follows. You must first design a network to interconnect your nodes, then a protocol can be designed using either a "master-slave" or a "daisy chain" (software) topology. Most certainly, connecting everything in parallel is unlikely to be practical (or reliable).

@ Paul__B
Your post confuses me - you say that I "still did not address a question" which never was asked and you are telling me things I must do. ... to clarify. I am simply brainstorming here. I am just curious to hear some of the thought from people in this forum.

  • I understand that the Atmega328p is quite average in regards to its functions. I am wondering if there might be other devices designed specifically for this type of scenario. I am not expecting the perfect microcontroller, I realize that most of the heavy lifting will have to happen on the software side, I am just curious if someone might know of a microcontroller which is better suited for this type of task.

  • What makes you think I misunderstood Graynomads question? I am aware of the limitations of i2c, but thanks for pointing it out.

  • To answer your question explicitly:
    The topology of the nodes is arbitrary, which is the entire point of the exercise in the first place.

EDIT: However because the topology is arbitrary they could be reconfigured for programming, though I would prefer to avoid that. Also, even though the physical topology is arbitrary, I could still have them electrically be connected in parallel.

As has been noted you need a network running first, no matter what the final programming solution is. That could be I2C, RS485 etc etc. Anything that allows multi-drop. But assuming that's in place and also assuming we are talking about reflashing the chips (because just sending new parms over I2C is hardly worth talking about :))

Each node has a serial Flash/EEPROM/MRAM/FRAM/SRAM (let's call it EEPROM for now), part of the normal running code checks the network for a "burn me" command, having got that it reads a HEX file from the network and stores that file in EEPROM, any one or all of the nodes can do this at the same time with an appropriate protocol in place although for error reporting it might be better to do one at a time, makes no difference to the operator as there will be a program written on the PC to handle this all as a single operation.

When the nodes have successfully received and stored the new code they set a flag in the EEPROM and jump to the bootloader.

The bootloader checks that flag and reflashes the chip or not according to it's state. After flashing it clears the flag.

Another option would be to use a chip that allows code execution from RAM, I think that all ARMs can do that, in this case your "bootloader" is the main flashed code, it does similar to above then jumps into the new code in RAM. That code has to also sniff the network for the "burn me" command or maybe you have a hardware signal on the network to force the chips into the bootloader or even something like a timer that detects a break condition on the network and resets the chip. For that matter some UARTs will do that so no external hardware required.


Rob

Thanks Rob

I'll take some time to look into your suggestion in more detail.

I have done this both intentionally

by writing super sloppy i2c code

XD
Yeah, me too.

On a more serious note, I can think of one way to do it but it is a hardware approach. You can make a configurable USB port that can connect to any one of a number of UNOs that are connected to USB connectors and configure the programmable USB interface for
PORT-1 (for example) as opposed to PORT-2, but as I mentioned, I don't know how a program can duplicate what a user does when they move the mouse cursor over the "PLAY" button on the IDE to compile and load a sketch. If there were a command that could be put in a program you could execute that. The other issue is how would you load the sketch that you want to load to all these controllers ? Would you do it yourself using a mouse and computer screen or were you suggesting an automated method do doing it under program control. I didn't understand that part.

I need to be able to reprogram them frequently,

This seems like an oversimplification or generalization. I can't tell where the human stops and the program starts.
Maybe I am taking you too literally again but it sounds like you want to push a button and presto whamo , all the uCs are updated.

fkeel:

  • I understand that the Atmega328p is quite average in regards to its functions. I am wondering if there might be other devices designed specifically for this type of scenario. I am not expecting the perfect microcontroller, I realize that most of the heavy lifting will have to happen on the software side, I am just curious if someone might know of a microcontroller which is better suited for this type of task.

There are literally thousands of microcontroller types available, so in order to make a tailored selection it's important to understand exactly what the task is. But probably all types of micros on the general market are not designed for parallel or neural computing. Those are to be found in research labs.

  • To answer your question explicitly:
    The topology of the nodes is arbitrary, which is the entire point of the exercise in the first place.

EDIT: However because the topology is arbitrary they could be reconfigured for programming, though I would prefer to avoid that. Also, even though the physical topology is arbitrary, I could still have them electrically be connected in parallel.

Surely the electrical connection is the physical topology? Unless you are using wireless, once you have wired the nodes, that determines the physical topology. You can't change that by downloading new firmware. You might change the logical topology with new firmware, but nodes will have to perform routing. If you have all nodes connected to the same bus (ethernet?), then bus contention becomes a dominant factor.

Tbh, I'm not sure about the general idea. If it is just to explore different algorithms and topologies, you can do all that in software on a PC, and much better. Using physical hardware like an Arduino doesn't make a lot of sense. Arduinos have a pretty limited comms and processor power available. I think what you would end up exploring is how slow the comms between nodes is.

I think if you had a fixed topology, the project would be doable, even if to demonstrate that loosely coupled simple nodes are highly inefficient. To use chips that are designed to be monolithic, and to create a re-configurable physical topology, you would really need to design some custom interconnect hardware.

The Transputer was designed for interconnection, but is no longer with us. Some of the concepts live on in the XMOS xCORE chips, and you can get eval kits for a reasonable price http://uk.farnell.com/xmos/xk-1a-kit/xs1-l8-64-400mips-8core-dev-kit/dp/2356356?CMP=GRHS-1010210.

Ha. Thanks for the Transputer link. I did not know about it. Reading up on xCore and XMOS now.

haha ... NOW we're talking :slight_smile:

http://forum.arduino.cc/index.php?topic=6915.0