Thank you for your detailed answer.
I'm using an OV7670 640X480 camera module and an nRF24L01+ 2.4GHz transceiver because I have those on hand.
Data sheets:
Camera https://www.voti.nl/docs/OV7670.pdf
Transceiver https://www.sparkfun.com/datasheets/Components/nRF24L01_prelim_prod_spec_1_2.pdf
If I'm reading these correctly, big if, the camera is serial, single bidirectional data line and the transceiver is SLI. So, both Serial and SLI.
At the time of this rely I have not read your links, but I will directly. I just wanted to provide an initial reply first.
I teach 3rd and 4th level programming - Python, Java, C++, elementary - undergrad, so hardware control programming is a fringe topic in my professional life, but in on my own I've done a lot of reading but little practice. I expect to make well educated guesses and many mistakes too.
That being said, I plan on approaching the coding efficiency problem using a switch in my main loop and a time slice index.
Many widget operations poll a sensor, poll transceiver, change motor state, pulse servo, serial write will need to happen in sequence.
In a higher level language, I'd write a function for each and in my loop I'd use a if based decision tree based on a time index to see if that function needed to be called in that iteration. That would make for many reference calls and a lot of time wasted making decisions. I believe that I need to eliminate the functions and most of the decisions from that logic and opt for a perfectly rigid design.
Instead, I plan on mapping out on paper how often I need each operation to happen per second to achieve the proper operation of the overall machine. If I assign each operation a frequency count of 0-7 then I can create 8 time slices with different combinations of the necessary widget code.
In each slice will be a copy of the code for each widget operation in the order that it needs to happen. This means I will have multiple instances of the same code but no functions.
Edit: As I understand the C compiler, creating sequential cases starting at zero usually makes to compiler opt for a speedy single operation look-up table. This being the fastest possible code branching for more than two options.
I will then put the time slices into a switch that's evaluated each iteration of the main loop. The switch is based on a slice index. After the switch and slice code is evaluated I will increment the slice index.
Edit: I will need to consider a different method of changing the slice index as some slices will only run once in a cycle and others will run many times. I can probably work out a simple pattern.
Instead of using the modulus operator or an if statement I will mask the last three bits as the value for the switch, so the value self modulates and the time to do this is fixed and predictable.
Edit: I see this model as my interpretation of a real-time system that provides the exact same responses in the exact same amount of time regardless of the state of the machine.
I am considering also if it would be simpler/faster to use a Mega and make a code based solution for the transposition of data, of If I should use and Uno and a transceiver based switch so I can pipe selected serial data directly from the camera to the transceiver while using the MCU to operate the controls. Possibly freeing up a few steps and still using the cheaper machine.
This project is in its early stages, I just want to make sure I'm barking up the right tree before I begin the real work.
I am putting a little more info in my other reply on this thread.