Multithreading on Arduino?

pito,

You can use existing libraries and device code with a RTOS. You just need to use the proper task structure with queues, semaphores, and mutexes to get sharing to work.

Currently I am playing with FreeRTOS and ChibiOS/RT. I like ChibiOS/RT for ARM but FreeRTOS is a better choice for avr Arduino.

Using a preemptive kernel takes knowledge/skill for proper use so it's not for beginners.

fat16lib,
I would be happy to see an example of using default arduino libs with an rtos, indeed :astonished:.
I've ran chibios on my stm32vl discovery modules (the demo, I overclocked them, both 48MHz stable, one module 56MHz stable). So not bad for $10 gadget 8).
P.

I would say you cannot use the existing libraries (ie i2c, spi, uart, etc.) for an rtos easily - you need to rewrite those as you need none blocking ones, you need to control a concurrent access to them from several tasks/threads, etc. This is the issue - my understanding is all drivers need to be redone

The real question at this point is whether the APIs that have been defined by Arduino are reasonable if you suddenly put an operating system (let's skip the "RT" part for now; IMO, RT is overrated and ... really hard) underneath.
Mind you, it's not at all easy to do microcontroller-like things on top of an operating system. bit-banging I2C out a couple of general purpose IO pins becomes a choice of whether you want to move excessively slowly (by blocking) or have the functions take longer than you'd like. Ditto software-serial and SPI, though at a byte level; some things can be moved to interrupts, but the SPI hardware present on an AVR is still byte-at-a-time. Doing really well under an OS means you need DMA or highly buffered and intelligent controllers for each perpheral, and the AVRs don't have them. (some of the 32bit CPUs might.)

But pretty soon you end up with WINCE, .NET, or Android, where IO pins are no longer directly accessible, flexibility is lacking, latency sucks, and you need an (old-style) arduino-class board (with no OS) to play with real hardware. Fine if you want a micro web server that copies stuff from SD card to TCP, but lousy at driving that big LED matrix thing or doing retro electronic gaming...

You could do a "proof of concept" on an old linux PC. digitalWrite() out the parallel port, etc...

Concerning the existing libraries and an (RT)OS:

The solution which would most likely deliver the best results performance wise would IMHO be a rewrite of the current libraries, but there is more than one way to skin a cat.

The solution with the highest degree of reuse-ability of existing libraries would most likely be just a "partial rewrite" which comes with some performance trade offs. This solution is what the DruinOS people had in mind.

In the Microchip AppNote AN1264 is more information about how this could be done:
http://www.microchip.com/stellent/idcplg?IdcService=SS_GET_PAGE&nodeId=1824&appnote=en544728

Regards,

Robert

..the simplest rtos approach is offered by some compilers - for example CCS for PIC. Simple, included, consist of 13 functions, cooperative..

p.

It's not that hard to use FreeRTOS. I downloaded FreeRTOS and did a few edits to make it compatible with Arduino. I made FreeRTOS an Arduino library, AvrFreeRTOS. It took about an hour.

Below is my test sketch with two tasks. One that writes data to an SD and one that blinks an LED.

Next I will replace the LED sketch with one that reads a sensor and queues the data to the SD sketch. The queue will provide buffering so no data points are lost during the SD write. The sensor sketch will run at higher priority than the SD sketch.

#include <AvrFreeRTOS.h>
#include <SdFat.h>

SdFat sd;
SdFile file;

#define LED_PIN 3

static void vLEDFlashTask(void *pvParameters) {
  for (;;) {
    vTaskDelay(150);
    digitalWrite(LED_PIN, HIGH);
    vTaskDelay(50);
    digitalWrite(LED_PIN, LOW);    
  }
}

static void vSdTask(void *pvParameters) {
  for (;;) {
    vTaskDelay(1000);
    file.println(millis());
    file.sync();
  }
}

void setup() {
  int r;
  Serial.begin(9600);
  pinMode(LED_PIN, OUTPUT);

 xTaskCreate(vLEDFlashTask,
    (signed portCHAR *)"Task1",
    configMINIMAL_STACK_SIZE + 100,
    NULL,
    tskIDLE_PRIORITY + 2,
    NULL);
    Serial.println(r);  
    
  xTaskCreate(vSdTask,
    (signed portCHAR *)"Task2",
    configMINIMAL_STACK_SIZE + 300,
    NULL,
    tskIDLE_PRIORITY + 1,
    NULL);  
 
  if (!sd.init()) sd.initErrorHalt();
  file.open("RTOS.TXT", O_WRITE |O_CREAT);  

  vTaskStartScheduler();  
}

void loop() {
  // Insert background code here
}

Has anyone used the ThreadKit rtos? I would like to see some sample code using com ports and interrupts.

@fatlib16: what happens when you not use vTaskDelay() functions in a task (or in both)?p.

Marius,

ThreadKit is not a RTOS, it is an implementation of extremely lightweight stackless threads. As such, it's response to interrupts is much more limited than a preemptive kernel. What do you expect to see in sample code?

Pito,

A key function of a preemptive RTOS is to control which task has the CPU. A task can release the CPU in a number of ways. Common ways for a task to block without using the CPU are to delay, wait on a message queue, or wait on a semaphore.

If a task never blocks the standard thing for a preemptive RTOS happens. If the task is the highest priority it starves all other tasks. If two tasks at the same priority don't block they will run round-robin in many RTOSes.

In my example there are three tasks. The highest priority is the LED task, next is the SD task, and finally loop(), the idle task.

If you remove vTaskDelay from the LED task, it will use 100% of the CPU blinking the LED as fast as possible.

If you remove vTaskDelay from the SD task but leave vTaskDelay in the LED task, the LED will run as before but the SD task will write to the SD as fast as possible with CPU time not used in the LED task and loop() will get no CPU time.

As you can see you should avoid calls to the Arduino delay() in high priority tasks. Fortunately the Arduino core has no delay calls.

fat16lib:
pito,

You can use existing libraries and device code with a RTOS. You just need to use the proper task structure with queues, semaphores, and mutexes to get sharing to work.

MAYBE they can work. Unless there's a way to set a task to "INT_MAX" or disable rescheduling, there are things that are going to break badly. Like, bit-banging on a communications port. Leaving Tx high for longer than a bit clock is a BAD idea.

or disable rescheduling

This is one of the advantages of a run-to-completion operating system ("cooperative multitasking"); a task isn't blocked unless it explicitly asks to be blocked, so you can still do little timing-critical loops and such. (in ONE task, anyway. This wrecks "somewhat critical timing" in other tasks.

The thing is, a fair number of things that one typically does with "small" microcontrollers are really dependent on NOT having an operating system in there mucking with your timing. You can look at the history of the IBM PC and see it "mature." The original DOS was more of a program loader and file manager than real operating system, but you could implement your own ISRs, or twiddle the bits on the parallel port to do neat things. When windows came out at about the same time that higher speed modems became affordable, companies started having to ship Serial port cards with deeper FIFO, because the OS introduced too much latency (time with ISRs disabled) and the older UARTs dropped characters; but you could still write your own "user space" drivers. In the WXP derived OSes you lose direct access to hardware devices and have to rely on the OS drivers...

It's easy to mistake "real time" to mean "fast", when in fact it has a lot more to do with determinism. I remember being really unimpressed with announcements out of some BBN research project back in the late 80s. It seems that their latest operating system on their expensive DoD funded Internet thing could guarantee response to a packet within 1ms. At the time we (cisco) were processing 12000 pps on much cheaper HW; 1ms seemed like a joke. (Later, I came to understand that cisco's OS didn't come close to being able to offer any such guarantee, even though "typical" throughput was very much higher...) (Similarly, there used to be a lot of people that believed in Token Ring Networks over Ethernet because TRs were more deterministic. In both cases, determinism ended up losing in the market...)

Most OSes are pretty RAM-intensive, making them a poor match for most microcontrollers that have very limited RAM. The AVRs are particularly annoying because the ratio of "context" for a task to the total amount of RAM is quite high. (~35 bytes of context, 2K of RAM on a m328)

So it would be nice to define how the arduino core and libraries are supposed to behave in cooperating with hypothetical operating systems. It might have to be done twice; once for "real time preemptive" operating systems, and again for more generic OSes...

fat16lib:
ThreadKit is not a RTOS, it is an implementation of extremely lightweight stackless threads. As such, it's response to interrupts is much more limited than a preemptive kernel. What do you expect to see in sample code?

Well maybe just to see how a receive procedure is setup and how an interrupt procedure would call other code. Purely because I don't know how to use the ThreadKit - not to question it's functionality.
I actually like the idea of the basic threading.

westfw,

Almost nothing you say about modern preemptive kernels is true.

I know a lot about Cisco's IOS. I worked in the network research group at LBNL and we had a close relationship with Cisco. The head of our department, Van Jacobsen, became chief scientist at Cisco.

IOS has a run to completion scheduler, which means that the kernel does not pre-empt a running process — the process must make a kernel call before other processes get a chance to run. For Cisco products that required very high availability, such as the Cisco CRS-1, these limitations were not acceptable. In addition, competitive router operating systems that emerged 10–20 years after IOS, such as Juniper's JUNOS, were designed not to have these limitations.

Cisco's response was to develop a new version of Cisco IOS called IOS XR. It is based on 3rd party real-time operating system micro-kernel (QNX), and a large part of the current IOS code was re-written to take advantage of the features offered by the new kernel — a massive undertaking.

A company like Cisco can afford to spend a lot of resources to polish a product that will sell in large volume and high profit. The methods used by Cisco for communications don't necessarily translate to other applications.

For one-off hobby projects a RTOS can be a good tool to accomplish more with less code and time.

I agree that a ATmega 328 is small for use of an OS and it wasn't designed for quick context switches. Still FreeRTOS can do a context switch on a m328 in about 50 us.

ARM processors were designed to be run with a micro-kernel. A modern micro-kernel can do a context switch in as little as 2.2 us on a 72 MHz stm32 http://yagarto.de/projects/rtoscomp/index.html. ARM Arduino is where a RTOS could help hobbyists accomplish more sophisticated projects.

Here is the FreeRTOS test program I used with a scope to check the context switch time. The lower priority task switches an LED on, off, on, and then gives a semaphore to do a context switch to a higher priority task which turns the LED off.

#include <AvrFreeRTOS.h>

#define LED_PIN 3

xSemaphoreHandle xSemaphore;
 
static void ledOffTask(void *pvParameters) {
  for (;;) {
    xSemaphoreTake(xSemaphore, portMAX_DELAY);
    digitalWrite(LED_PIN, LOW);    
  }
}

static void ledControl(void *pvParameters) {
  for (;;) {
    vTaskDelay(10);
    digitalWrite(LED_PIN, HIGH);
    digitalWrite(LED_PIN, LOW);
    digitalWrite(LED_PIN, HIGH);
    xSemaphoreGive(xSemaphore);
  }
}

void setup() {
  pinMode(LED_PIN, OUTPUT);

 xTaskCreate(ledOffTask,
    (signed portCHAR *)"Task1",
    configMINIMAL_STACK_SIZE,
    NULL,
    tskIDLE_PRIORITY + 2,
    NULL);

    
  xTaskCreate(ledControl,
    (signed portCHAR *)"Task2",
    configMINIMAL_STACK_SIZE,
    NULL,
    tskIDLE_PRIORITY + 1,
    NULL);  

  vSemaphoreCreateBinary(xSemaphore);  
  vTaskStartScheduler();  
}

void loop() {
  // Insert background code here
}

Almost nothing you say about modern preemptive kernels is true.

I said that they're inconvenient in conjunction with critical timing sections of non-OS based code, and that they tend to be RAM hungry, and that they're not necessarily major wins over non-preemptive or non-real-time kernels. You didn't contradict any of those statements, and I'll stand by them. They're not necessarily deal-killers; I'm just trying to point out issues and alternatives.

For one-off hobby projects a RTOS can be a good tool to accomplish more with less code and time.

OK. Less user code, and less programmer time, anyway. It all depends on how much of your resources you can dedicate to the overhead of the OS. It's ALWAYS non-zero.

I decline to discuss cisco's internal development efforts. But it's a good thing that the high end routers do most of their speed-critical work "in hardware."
Even I was surprised when Procket (another competitor) explicitly decided to implement a run-to-completion OS in the same timeframe that cisco was working on trying to get rid of theirs...

FreeRTOS can do a context switch on a m328 in about 50 us.

Is that the (external) time from external event till the task blocking on that event runs, or just the (internal) time to switch tasks? (the former would include ISR latency plus ISR-related context switch and execution, plus IPC latency and execution fro ISR to sheduler. etc.) Still, 50us would mean that naive serial code would be limited to about 38400bps, rather than the 2Mbps some people are doing with bare AVR code... (though I don't see anyone claiming that UART buffering should be handled in a task in FreeRTOS, the way they imply you might do in in QNX. Feh.)

RTOSes are also good for very large projects. I was involved with the LHC project at CERN.

The Large Hadron Collider (LHC) sits in a circular tunnel 27 km in circumference. The tunnel is buried around 50 to 175 m. underground. It straddles the Swiss and French borders on the outskirts of Geneva.

The LHC is designed to collide two counter rotating beams of protons or heavy ions. Proton-proton collisions are foreseen at an energy of 7 TeV per beam.

The control system for this accelerator uses over 1000 computers with most running LynxOS. LynxOS has 2 us task context switch time and 1 us interrupt latency with modern processors.

This project could never have been done without modern RTOSes.

This control system is totally object oriented with device access code written in C++ and higher layers written in Java.

LHC - so many computers with 1usec latency - now I understand where those missing 60ns come from.. :D.p.

Actually it was the injector for the LHC that created the neutrinos sent 732 km from CERN to the Gran Sasso National Laboratory in Italy.

Here is the paper with about 200 authors: http://estaticos.elmundo.es/documentos/2011/09/23/cern.pdf

I wouldn't bet on the 60-70 ns too soon. It is an amazing effort but there sure are doubts.

If neutrinos travel faster than light, we should have detected the neutrinos from Supernova 1987A before we saw the explosion itself.

The CERN neutrinos travel faster than light, by about 1 part in 40,000. The neutrinos from SN1987A traveled so far they would have arrived here almost four years before the light did if the CERN result is correct. However, we saw the light from the supernova at roughly the same time as the neutrinos.

Of course there may be something different about the neutrinos created in the CERN/Gran Sasso experiment.

..gravitational sling shot caused by Earth, as the neutrinos posses a mass..
When coming from sn1987a the gravitational influnces of galaxies cancel out.. :grin:

PS: for all arduinists concerned with millis() precision here is a good introduction to timekeeping (look for pdf there):
http://www.allanstime.com/Publications/DWA/Science_Timekeeping/index.html

@fat16lib:

fat16lib:
It's not that hard to use FreeRTOS. I downloaded FreeRTOS and did a few edits to make it compatible with Arduino. I made FreeRTOS an Arduino library, AvrFreeRTOS. It took about an hour.

This sounds interesting.
Is this available somewhere for download?

Regards,

Robert

Don't forget to take a look at BeRTOS (http://www.bertos.org/).

Has a few nice demos too and supports a wide array of (fairly cheap) hardware including the Atmega328.

I've played with the APRS demo (works great) and the GPS code as well (not quite as great with the AVR, but it does work).

Just a thought,

Brad.