This is a hypothetical question. Suppose I want to build a rather complex system which consists of subsystems that do very particular tasks. For example -- a robot loaded with a bunch of sensors & LCD display that require at least 10 ATmegas. Or, as another example, I want a system to do complex calculations that require more than 32k Flash of prog memory and wish to distribute the workload across multiple ATmegas (speed is not an issue).
Each subsystem will only use some features of the Atmega but not all of the features of an Uno board or any other board (i.e., they'll be constructed barebones). Each will only need to do one of two things: either report data if it is hooked up to a sensor (and do computation) or receive control signals to control an actuator (and do computation). Or it just does calculations and report the result to a "master" uC.
I guess what I'm asking is whether one could create a distributed system using the Atmega.
It sounds like you are asking "What should the network topology be?" That is a big subject. Usually the way it's solved with Arduinos is one is always the 'master'. It controls the network. The slaves don't transmit anything unless the master specifically asks them to.
If the units are all relatively close, like 10cm apart, then an I2C bus is relatively easy for up to 120 slaves.
SPI is better over longer distances and for faster data transfer, but then the master must have enough spare pins to select each of the slaves individually.
Serial is more difficult to make it work with multiple transmitters but if the master doesn't need to receive from more than 4 slaves, this can be pretty easy.
My approach, based on 20-some years of no experience, would be using software serial in a creative multiplex and follow a neuronal map with intermediate stage processing happening like combining input from several other units.
But at some point, you wonder if using a more powerful processor would make more sense.
I want a system to do complex calculations that require more than 32k Flash of prog memory and wish to distribute the workload across multiple ATmegas (speed is not an issue).
This might be something interesting to play with but there is absolutely no valid reason I can see to distribute processing in this manner. Just get a faster CPU.
However it often does make sense to distribute processing in applications like industrial control and maybe even your robot, especially if you want robustness, IE some of the system may die but the rest can carry on.
Depending on the distances I would suggest that async serial (UART) comms is the way to go, probably a multi-drop network with RS-485 transceivers. You could roll your own or as Mike suggested use DMX.
Or, as another example, I want a system to do complex calculations that require more than 32k Flash of prog memory and wish to distribute the workload across multiple ATmegas (speed is not an issue).
Using the 4 hardware serial ports on a series of Megas to form a communication network sounds almost exactly the same as the description of the TMRh20's nRF24 network system except that you are using the serial ports in place of the underlying RF24 system.
Whether it is useful to split up tasks and distribute the work between different Arduinos is a different question altogether.