Networking layer on present+future ARM-based boards?

Recently I have been contemplating what future Arduino networking might look like on 32 bit microcontrollers.

Today we seem to have a huge capability gap between microcontrollers and Linux-based boards. Obviously 8 bit AVR with 2K to 8K RAM and polling-based I/O imposes pretty severe limitations on the microcontroller side. On the Linux side, obviously you get fully featured networking with daemons for nearly all protocols and powerful networking client software, but the cost is dozens (or hundreds) of megabytes of RAM and a powerful (power hungry) processor.

In the not-too-distant future, as semiconductor manufacturers move microcontrollers to 65nm and even 45nm transistors, we'll get Cortex-M microcontrollers with 256K to 2M RAM, running in the 200 to 500 MHz range, with on-chip DMA-based ethernet MACs and wireless networking. Even now, chips like the SAM3X on Arduino Due have high performance ethernet just sitting unused, but there's little point to connect the pins because of lacking software support.

My hope is to design networking layer / stack / libraries that fills the gap, with good performance and a fairly easy path to build a project with a variety of networking services, but scaled appropriately to near-future microcontrollers where 20K to 100K RAM usage is acceptable.

Concurrency is certainly a big issue. There are a lot of complex technical choices, but as far as I can envision, they always boil down to three possible paths for the design of how future sketches would actually accessing networking:

#1: Non-blocking functions, like Serial.available() and Serial.read()

#2: Blocking functions, like Serial.print() - requires I/O aware Scheduler or RTOS, segments RAM into tiny stacks

#3: Event-based structure imposed, like serialEvent()

Designing something powerful and easy to use that can work within the (near future) limitations of microcontrollers requires many difficult trade-offs. If you want everything, with all types of APIs and all possible features and fully preemptive scheduling and ... inevitably you end up with something like Linux, compromising the original goals of simple to use and able to run with limited resources.

I'm sure there are lots of opinions. I've love to hear yours!

Most credible networking stacks require mutitasking, and both multitasking and networking tend to be very RAM intensive. Most "microcontrollers" seem to specifically limit RAM in favor of ROM of some kind :frowning:
That said, cooperative multitasking is sufficient, and 512k or so of RAM should be enough. Early cisco routers and terminal servers ran on 10MHz 68000s with 1M of RAM, and would handle 50-odd users worth of telnet connections.

That "multitasking OS" is a big stumbling block, though. Spinning your own seems like a waste of time, but picking an existing OS is fraught with all sorts of technical, political, and business issues. (TI is now supporting multitasking in Energia using TI/RTOS, and I was a bit shocked at my own unwillingness to invest much time in learning a TI-specific OS. Sigh. Likewise competitors with restrictive or viral licenses.)

westfw:
TI is now supporting multitasking in Energia using TI/RTOS, and I was a bit shocked at my own unwillingness to invest much time in learning a TI-specific OS. Sigh.

At Maker Faire in San Mateo this year I talked with the TI guys and they mentioned this. I must confess, I didn't even look at it until just now, thanks to you!

My informal & quick litmus test for quality of RTOS support is to look at Serial.write() and delay(). Many functions block, but these are by far the most commonly used. Surely if someone is serious about RTOS integration, they'd do these first!

Well, I'm sad to say they make no attempt to yield inside of delay().

void delay(uint32_t milliseconds)
{
        uint32_t start = micros();
        while(milliseconds > 0) {
                if ((micros() - start) >= 1000) {
                        milliseconds--;
                        start += 1000;
                }
                __bis_status_register(LPM0_bits+GIE);
        }
}

Likewise on Serial.write(), when the serial transmit buffer is full, they just busy loop. No attempt is made to atomically check a semaphore/event and wait until it's signaled.

size_t HardwareSerial::write(uint8_t c)
{
        unsigned int i = (_tx_buffer->head + 1) % SERIAL_BUFFER_SIZE;

        // If the output buffer is full, there's nothing for it other than to
        // wait for the interrupt handler to empty it a bit
        // ???: return 0 here instead?
        while (i == _tx_buffer->tail);

Even worse is a lack of thread-safe access to the head and tail variables! In a single threaded system, this is safe if the single main program thread is the only writer to head. But as soon as 2 threads can write to head with preemptive scheduling, this becomes a race condition.

Energia has an example in File > Examples > 10.MultiTasking > MultiTaskSerial showing two simple loop() functions, each calling Serial.print() and delay(). I'm sure it manages to run for very long times without the race condition occurring, since each thread only prints a short string between lengthy delays. The delays are multiples of each other, so it might even stay in lock-step sync and never hit the race where one thread preempts the other and alters head while the first is depending on the value it previous read. Of course, in a real application with real-world events influencing timing, eventually the race condition will strike!

Normally this sort of code would require a mutex or "critical section". TI seems to call this "gates" in their SYS/BIOS manual (admittedly, I've spent less than 1 hour skimming their 256 page manual). Their RTOS certainly has the capability to do things correctly.... but at least so far they do not appear to be making much use of it within the many Arduino I/O functions.

This is why I generally have a pretty negative opinion of applying an RTOS to Arduino, and especially one based on fully preemptive scheduling. If TI can't even properly implement their own SYS/BIOS RTOS within the most important and widely used Arduino core library functions, what hope is there for quality RTOS support in the rest of the system and dozens of important libraries?

I'm confident that if I tried to go down this path, I too would never manage to stamp out the thousands of subtle race conditions it would create in libraries like Ethernet, Wire, SPI, not to mention that incredible number of 3rd party libraries, many of whom have little incentive to merge complex RTOS-support patches!

@Paul Stoffregen
Have you had a look at Contiki OS? The first versions of this RTOS for sensor networks was implemented with Proto Threads. This is a variant of cooperative multi-tasking with a very small context switch. I have an OOP implementation of Proto Threads in Cosa. The chef architect, Adam Dunkel, implemented a tiny IP stack with Proto Threads.

In any case it is very true all you are saying about making Arduino libraries run on a RTOS. That is a lot of work as the Arduino core is very much bare metal. In a "normal" RTOS/OS/Framework there is a set of interfaces for device driver integration (plug-in). This is the key. For instance, SPI, which I know you have worked on: Is that interface multi-tasking aware? We have taken the first steps with the enable/disable of interrupts from other SPI devices but there are some further steps to taken. The same goes for I2C, UART, etc.

For some time the answer to the question of building larger Arduino based systems has been more memory and with ARM more processing power. This does not really solve that problem. Even the new library manager does not solve the problem of integration. It only makes libraries more available. There is still no "support" that they actually work together. There are other ripple effects of larger systems; the loading of the image (through the bootloader) takes too long time. Having a resident library becomes more or less necessary to reduce waiting when developing software (e.g. testing).

The bottom line is software architecture. Who is/are the Arduino software architect(s)? What are the design rules when implementing a new device driver/support library to allow integration?

Cheers!

Regarding tech details....

kowalski:
Have you had a look at Contiki OS?

Yes, but not much. Likewise for many others, I've looked at them briefly. They're all very similar, with the main distinction being cooperative vs preemptive task switching. The RTOS kernel isn't trivial, but that part exists in many well tested projects. It's not the missing link...

So far, I've not seen anyone make a really comprehensive effort to port even just the core library to RTOS awareness, nor the dozen libraries that come with Arduino. A couple years ago, after much debate, I talked the Arduino devs into including a yield() placeholder function, but even that hasn't been adopted very pervasively throughout the many blocking functions.

For instance, SPI, which I know you have worked on: Is that interface multi-tasking aware? We have taken the first steps with the enable/disable of interrupts from other SPI devices but there are some further steps to taken. The same goes for I2C, UART, etc.

The SPI transaction code is not RTOS aware, but it is at least an API that embeds the necessary info into sketches and libraries about when exclusive access to hardware is needed. The interrupt masking code could, in principle, be replaced with RTOS mutex and priority elevation.

Likewise, Wire already has the API needed (at least I think it does) to infer when bus access begins and ends, so it could be redesigned for RTOS use.

Even the new library manager does not solve the problem of integration. It only makes libraries more available. There is still no "support" that they actually work together.

No, of course not. In many cases, certain libraries can never work together, because they require exclusive use of the same hardware resources. But in cases where compatibility is possible, technology isn't the answer. Only smart, forward-thinking design and a lot of actual work to test and patch code provides broad compatibility. I do not believe there's ever going to be a substitute for smart people investing real work.

There are other ripple effects of larger systems; the loading of the image (through the bootloader) takes too long time. Having a resident library becomes more or less necessary to reduce waiting when developing software (e.g. testing).

Slow programming is just inefficient coding on some (perhaps many) boards. It doesn't have to be this way. If you use my 32 bit boards, I believe you'll see the entire 256K memory can be programmed in just a few seconds.

This is indeed the bottom line:

kowalski:
The bottom line is software architecture. Who is/are the Arduino software architect(s)? What are the design rules when implementing a new device driver/support library to allow integration?

Officially, probably Cristian Maglie would be considered Arduino's software architect.

Unofficially, it's us, right here, right now! :smiley:

I'm sad to say they [TI/Energia MT] make no attempt to yield inside of delay().

I think you're looking at the wrong code. The MSP432 (the only board supporting TI/RTOS) has an entirely different hardware directory, most of which is pre-compiled code, with delay looking like:

void delay(uint32_t milliseconds)
{
    switch (delayMode) {
        /* using Timer_A, check for opportunity to transition to WDT */
        case 0:
            if ( (milliseconds >= 250) && (milliseconds % 250) == 0) {
                delayMode = 1;
                switchToWatchdogTimer();
            }
            else {
                delayMode = 2;
                switchToTimerA();
            }
            break;
        /* using WDT, check for need to transition to Timer_A */
        case 1:
            if ( (milliseconds >= 250) && (milliseconds % 250) == 0) {
                /* stay in mode 1 */
            }
            else {
                /* switch to Timer_A and never look back */
                delayMode = 2;
                switchToTimerA();
            }
            break;
        /* always using Timer_A */
        case 2:
            break;
    }

    /* timeout is always in milliseconds so that Clock_workFunc() behaves properly */
    Task_sleep(milliseconds); 
}

..../hardware/msp432/cores/ms432/lib/wiring.c

@Paul Stoffregen

The yield() placeholder is a great idea though in Cosa it is defined as a function pointer and not a weak function. This gives more runtime freedom. There are several busy waits in Cosa that does yield(). The default version of yield() is a power down sleep. In the Cosa Shell library it is replaced with an extended version that will capture the amount of idle time. The Cosa multi-tasking library Nucleo has a variant of yield() which does a context switching or power down sleep. The same goes for delay() and sleep() which are redefined by the Watchdog (Low Power Timer), RTC (milli-second) clock, and the Nucleo. They are installed by the "begin" member function.

Back to the architecture issue; "we" should start to document the "southbound interfaces" in the Arduino core. In Cosa I have throughout the driver code used a "northbound interface" towards the application level, and a "southbound interface" towards the driver layer. These are often Device classes and used with a delegation design pattern. A specific device driver is a sub-class. This decouples the application from the driver specific. The difference in design is easy to see if comparing Arduino Print class and Cosa IOStream and IOStream::Device. The most advanced of this structure are LCD and LCD::IO adapter classes. And obviously LCD is an IOStream::Device and can be used with the IOStream interface.

I believe that the way forward is to introduce and (re)structure the "southbound interfaces" in the Arduino core for a 2.X language release. This obviously will require more disciplined device driver writers but with the new library manager structure the community is starting to see the win-win.

Cheers!

westfw:
I think you're looking at the wrong code.

Wow, yes, you're right. I was looking at the msp430 core, totally unaware they only support this on the msp432 one.

I'll have to take another look... after I'm done dealing with supporting today's "r4" release!