Which should be the perfect protocol and why?

Imagine what happens if the main CPU sends a command to the GSM one during that delay(5000). It can't respond or even take a note of what was said so that it can do it later.

You have to design the system to work without delays. So you can do all of that on one Arduino and it will be much simpler than trying to get five to talk to each other.