Go Down

Topic: SCoop - multitask and Simple COOPerative scheduler AVR & ARM (Read 68963 times) previous topic - next topic



I m pleased to release another multi tasking alternative called SCoop for Simple Cooperative scheduler, for the Arduino and Teensy platforms AVR and ARM, using the yield() standard function.

There are couple of alternatives for doing multi tasking on arduino, including famous RTOS like chibiOS or freeRTOS, but also some light implementation like adOS published on this forum or interrupt driven library like ieOS.

I decided to create this one to bring some user friendly declaration principle and to bring some features I didnt find in other, especiall needed when doing home automation or industrial control, like we find sometime in PLC.

the Version 1 jus released this week end is available on google code here :

This pack includes a comprehensive (hopefully) 14 pages user guide and contains 3 standalone libraries.

SCoop.h provides object and macros for easy creating Tasks, Events and Timers.
TimerUp/Down.h provides object class for defining unlimited time counter object with time base handling
IOFilter.h provide objects for declaring and using Input, Output, time filter inputs (extended debounce style)
user guide v1.0 with user and technical info and some performance measurement.

The pack has been tested on Arduino Uno, Teensy++2.0 and Teensy3.0 (beta8) with Arduino IDE v 1.02.
I d be glad if someone could look at it and try it on Arduino DUE as I have not this board myself. just force the definition of __MK20DX128__ at the begining of SCoop.h

remark : to use it in a final project you might remove the line including "scoopdebug" and removing the SCOOPTRACE definition in the SCoop.h. at the moment it is still included, this gives the possibility to "trace" your code : just use trace("my step one in task1")

just an example:
Code: [Select]

#include "SCoop.h"

void myTask1::setup() { Serial.begin(57600); }
void myTask1::loop() { Serial.println("hello from task");sleep(1000); }

void myTask2::setup() { pinMode(13, OUTPUT); }
void myTask2::loop() {
  Serial.println("led HIGH");digitalWrite(13, HIGH);sleep(500);
  Serial.println("led LOW");digitalWrite(13, LOW);sleep(500); }

void setup() { mySCoop.start(); }
void loop() { Serial.println("do whatever you want here also"); mySCoop.sleep(500); }

got it ? :)


A short update regarding performance and footprint in memory.

The extra code size needed for the library for the scheduler, the task wrapper and a single call to yield in the main loop() is 1700 bytes on AVR  This looks cool especially considering that the library is written Object Oriented and use extensively the benefit of virtual methods.  :)

on Teensy 3 (ARM) we end up surprisingly with 4900 bytes more for the library. this is not really a problem as the chip got 128K program size, but I will investigate why sometime.

RAM memory used by the library variable is not significant here but is mainly impacted by the size of the stacks allocated to tasks. Hopefully this can be sized to the very minimum by using the library method stackLeft to monitor the unused space.

regarding performance, with a recent update in the yield method (to be published wednesday), we end up with a total time of 45us for switching between 2 tasks on a Teensy++2.0 and 55us on Arduino Uno and less than 10us on Teensy 3 (arm).
this time includes timing checking and storage with some calls to (un)famous millis as this library doesnt need any timer or interrupts.
this is anyway  good performance which provide a 5% max time lost for scheduling 4 task runing on AVR (3+main loop) and less than 1% on Teensy 3.

unless you do realtime missile cruise calculation you d be safe , right ?

here we go
any body got a DUE to try this fancy lib ?? !  :smiley-mr-green:


Any idea why the context switch is slow?

Typical RTOS times are faster.  For ChibiOS/RT giving a semaphore plus a context switch to the task that takes the semaphore is:

Uno: 15 us

Teensy 3.0: 3 us

Due: 2 us

Task context switch time are less, 11.25 us for 16 MHz AVR and 1.02 us for 72 MHz ARM CM3. http://www.chibios.org/dokuwiki/doku.php?id=chibios:metrics

Clearly this is not a problem for a cooperative scheduler.


yes, some idea about that

in fact the time I ve given is the total overhead time needed by the scheduler to

1- verify if the task should be interupted. this includes verifying time spent in the task and comparing it to a "quantum". this garantees that all the task will get a certain amount of cpu time (unless they slep of course)
2-calculate overage time spent in this task over the 2/4/8/16 last cycles
3- check which task is next
4- switch task context to next
5- calculate overage time spent since we enter in this task (2/4/8/16 last cycles), to update the overall cycle time variable.

and then it is very ok compared to the pure assembly context switch routine which is very fast as you pointed out.

the steps 2 & 5 can now be de activated in a new version of the library (pre processing) and this will probably divide the time given by 2, but I need a oscilloscope to check this !

last but not least, the library isnt using any interrupt but systematic calls to the Arduino Millis() and this approach is very flexible but consume more cpu than traditional RTOS systick method, especially on Arduino compared to Teensy which is better optimized.

here we are


Dec 18, 2012, 01:36 pm Last Edit: Dec 18, 2012, 02:08 pm by fat16lib Reason: 1
RTOSes do these thing also.  A RTOS has more task states to manage and reasons for context switches so the overhead for an RTOS should be higher.  The difference is that great algorithms have been invented in the last thirty years for scheduling.

RTOSes log counts and statistics and the overhead is tiny.  See the reports in the above link for ChibiOS performance statistics.  It isn't done with an oscilloscope.


last but not least, the library isnt using any interrupt but systematic calls to the Arduino Millis() and this approach is very flexible but consume more cpu than traditional RTOS systick method

In what way is it more flexible or better?

I do believe that a coop scheduler is best for most Arduino users.  It was good enough for the Apollo moon missions.

Here is a description of the Apollo OS.

There was a simple real-time operating system consisting of the Exec, a batch job-scheduling system that could run up to eight 'jobs' at a time using cooperative multi-tasking (each job had to periodically surrender control back to the Exec which then checked if there was any waiting job with higher priority). There was also an interrupt-driven component called the Waitlist which could schedule multiple timer-driven 'tasks'. The tasks were short threads of execution which could reschedule themselves for re-execution on the Waitlist, or could kick off a longer operation by starting a 'job' with the Exec.


hi fat16lib,

it is true that I would have a lot to learn in context switching if the goal for SCoop was to be the best os alternative for the Arduino comunity  :).
Now, I feel that the result we get with SCoop is still better than what we get out of the box from other rtos for the following reason :
SCoop is designed for the users, (based on my personal experience on selling software and hardware solution for industry in the 90's).
it is not design for the features or for topmost performances; as an example there is no concept of priorities !
but then this is sometime better and let me do the demonstration on an example that you could reproduce easily.

just setup 3 tasks doing a single 32 bits count++ in a loop containing a yield. with 1 specific counter per task count1,2 and 3. nothing else.
then in the main loop, just put another count4++ and 20 calls to yield.
then use a basic test against the standard millis() to stop and print something after 10seconds. of execution. this test is not going to impact the overall timing as there are 20 yields above it, right.

just print the sum of the 4 counters. (I used only Arduino Uno for this example)

with scoop, the value is 1 086 746 , so in 10 seconds we were able to increment 32 bit counters more than 1 million time.

with chibiOS_AVR out of the box, I got 460 855
with chibiOS configured in cooperative mode with CHCONF = 0, I got 413 824

so you understand where the problem comes from : the yield function in chibiOS is switching almost imediately to next task. As each task contains a very basic sequences which probably represent 1 us, then the total time spent in the overhead scheduling is very high compared to the task themselves.

in SCoop, the default yield is more complex in this regards as it checks the total amount of time spent in the task (400us for AVR by default) before accepting a switch. result is obviously less time in switching compared to the time allocated to the task.
and then we get 2 time more CPU time for the task, as the total counter reach 1 million.

so, as you pointed, I m sure that cooperative mode is a good way forward for majority of arduino community and then we need to provide them with mechanisms which are optimized for their usage, from a user standpoint.

I d love to see feedbacks from users about our libraries so that we can improve their usability, but in the meantime lets continue guiding them with our both respective approach !


Dec 19, 2012, 01:05 am Last Edit: Dec 19, 2012, 01:08 am by fat16lib Reason: 1
Your example of counting is silly but to continue being silly, here is my example of counting.

just setup 3 tasks doing a single 32 bits count++ in a loop containing a yield

Why would anyone put a yield in a loop with a preemptive RTOS?  That's what round robin is for.

With proper design ChibiOS does over seven million increments, seven times better than SCoop.

Here is the sketch with five tasks:
Code: [Select]

// Simple counting demo
#include <ChibiOS_AVR.h>

volatile uint32_t n1 = 0;
volatile uint32_t n2 = 0;
volatile uint32_t n3 = 0;
volatile uint32_t n4 = 0;
static WORKING_AREA(waThread1, 16);
static msg_t Thread1(void *arg) {
 while (1) {
 return 0;
static WORKING_AREA(waThread2, 16);
static msg_t Thread2(void *arg) {
 while (1) {
 return 0;
static WORKING_AREA(waThread3, 16);
static msg_t Thread3(void *arg) {
 while (1) {
 return 0;
static WORKING_AREA(waThread4, 16);
static msg_t Thread4(void *arg) {
 while (1) {
 return 0;
void setup() {
 // initialize ChibiOS
void chSetup() {  
 chThdCreateStatic(waThread1, sizeof(waThread1), NORMALPRIO, Thread1, NULL);
 chThdCreateStatic(waThread2, sizeof(waThread2), NORMALPRIO, Thread2, NULL);
 chThdCreateStatic(waThread3, sizeof(waThread3), NORMALPRIO, Thread3, NULL);
 chThdCreateStatic(waThread4, sizeof(waThread4), NORMALPRIO, Thread4, NULL);
// idle loop runs at NORMALPRIO
void loop() {
 uint32_t s1, s2, s3, s4;
 uint32_t t = millis();
 s1 = n1;
 s2 = n2;
 s3 = n3;
 s4 = n4;
 Serial.print("millis: ");

It prints:

millis: 10055

I am sure SCoop will be great for most Arduino users.

Still, I could give you endless examples that are not "topmost performances" where an RTOS is a better solution than SCoop.

RTOSes are now in billions of common products.  Every embedded software tools company provides a RTOS.

Here is Keil's RTX offering http://www.keil.com/rl-arm/rtx_rtosadv.asp.


Hi fat16lib, how are you ?


Why would anyone put a yield in a loop with a preemptive RTOS?  That's what round robin is for.

well, but my topic is just all about cooperative mode ! and also you recently published an example in coop mode, thats why I thought it would be interesting to provide a basic comparison out of the box in coop mode. in such a case you might recognize that yield() is and must be used as much as possible to provide fluent time sharing.


With proper design ChibiOS does over seven million increments, seven times better than SCoop.

you are right, you reach 7 million counts, in preemptive mode. I cannot do such a test as I do not support this mode.
but if we put a simple chThdYield() in each loop, your exmple falls to 592876 and that was and still is my point:
we have to provide users with a strong framework and couple of exampels, otherwise they will miss-use the os and will get wrong result.

so at this time we could propose a deal: SCoop for cooperative and chibiOS for preemptive !

but if I was a user lloking for a simple and efficient multitasking solution, I would love to see a chibiOS-wrapper, user oriented, that would combine the best of these both worlds !


Dec 19, 2012, 11:35 pm Last Edit: Dec 20, 2012, 12:47 am by fat16lib Reason: 1
well, but my topic is just all about cooperative mode !

Sorry, I didn't know the rules.

if we put a simple chThdYield() in each loop, your exmple falls to 592876 and that was and still is my point:
we have to provide users with a strong framework and couple of exampels, otherwise they will miss-use the os and will get wrong result.

But I would never do that, I would use other features of the RTOS to handle high priority requirements.  But that would break your rule that we can only use features of SCoop.

Salting yield calls at various places in code to make embedded applications work is a real pain and makes the code fragile and unreliable.  You get horrible interactions between unrelated tasks.

I wrote my last cooperative scheduler in 1972.  I don't plan to go back there.

I'll stick with technology I helped develop thirty years ago, the predecessor of VxWorks.  VxWorks is the RTOS used in many NASA projects including all JPL Mars Rovers.

Good luck with SCoop.

Edit: here is a going away challenge, implement the ChibiOS_ARM chFastLogger example with SCoop.  This example logs four ADC values at a rate of 1000 Hz.  Note that ChibiOS achieves a time jitter of about one microsecond between samples.  To get low SNR in a signal you need this level of jitter at 1000 Hz.  See any good reference on the theory of ADC signal measurements.

I ran the example with Teensy 3.0.

Here is a bit of typical data the first column is the time in micros():




I m pleased to release a new SCoop library V1.1 XMass Pack with some updates and goodies.

- updated user guide now 17 pages

SCoop.h :

   yield() routines optimized,
   new object SCoopFifo for easy handling of Fifo buffers,
   some preprocessing parameters to optimize speed
   should compile for Arduino DUE but NOT TESTED ...

Timer Up & down : cosmetic changes, code moved from .h to .cpp

IOFilter : not changed

some new examples including a 1KHZ sampling with fifo, to answer fat16lib post above :) and a new performance measurement sketch...

ok, it is time for a Christmas break and I wish you all a  * * * merry Christmass * * * and an happy new year !

here : https://code.google.com/p/arduino-scoop-cooperative-scheduler-arm-avr/downloads/list


Dec 22, 2012, 12:39 am Last Edit: Dec 22, 2012, 01:48 am by fat16lib Reason: 1
I looked at example4, it satisfied my curiosity.  It was about what I expected.



also, after a good night, I have slightly update the user guide (now V1.1.1) to correct some typos or bugs in the example code...
please update your file by downloading it from the google code project repository

feel free to provide feedbacks or updates to it, sure there are lot of gramar typo as I m not native english speaker :)


Guess I should have said more about example4 and SCoop in general.  My request for the data logger was a test to see if you know anything about embedded tools.

Of course you couldn't solve the problem with SCoop and needed to introduce an ISR since coop schedulers just can't do deterministic scheduling of a task.

Scoop is bug ridden as soon as it is used in an environment with interrupts. Take your queue for example.  Store and fetch of pointers is not atomic on AVR so you must not access a pointer from both an ISR and non-interrupt code unless you use proper methods.

Here is an example of the code for a store:
Code: [Select]

18:   f0 93 00 00     sts     0x0000, r31
1c:   e0 93 00 00     sts     0x0000, r30

An interrupt could happen between the store of the two bytes and the interrupt routine would fetch trash.

In addition you need to type many of your variables as volatile or the compiler will kill you as soon as something is shared in an interrupt context.  This is even true in much simpler multitasking environments.

Notice that I typed counts as volatile in many of my examples.  If you remove volatile from the counts in the ChibiOS example above like this:
Code: [Select]

uint32_t n1 = 0;
uint32_t n2 = 0;
uint32_t n3 = 0;
uint32_t n4 = 0;

It will print

millis: 10055

instead of

millis: 10055

My advice is to avoid SCoop until it is fixed.


thank you for your inestigation fat16lib;

yep I missed that :)
but we know this type of bug is not the first as there was one like that in Hardware serial for long time :)
anyway, I shouldnt have missed that. sorry for any inconvenience.

lets be clear : using an instance of SCoopFifo in an interupt (isr) might jeopardize the fifo methods.

I have just modified the code so that it is now atomic where needed for the AVR platform.
I will publish a version 1.1.1 in a couple of hours, just the time to do regression tests...
any other feedbacks or input greatly apreciated as it seems you have experienced a lot in this area

stay tuned.


Why do you need an ISR?  Any simple RTOS can read an ADC in a thread at low speeds like one point every tick.

All the systems I have tested do it with low jitter, a few microseconds at the most and about one microsecond on ARM.

My view is that as soon as you need an ISR the advantages of coop schedulers vanish.

I spent my career at several large physics research labs and we gave up coop schedulers forty years ago.  NASA scraped them after Apollo.

I worked for a while at CERN on LHC.  CERN uses LynxOS which I didn't appreciate at first.  It is very Unix/Linux like and allows scientists to do many embedded programming tasks on their own. 

Here is the pitch for LynxOS:

Because the LynxOS RTOS is designed from the ground up for conformance to open system interfaces, OEMs are able to leverage existing Linux, UNIX and POSIX programming talent for embedded real-time projects. Real-time system development time is saved and programmers are able to be more productive using familiar methodologies as opposed to learning proprietary methods.

I am beginning to think the coop scheduler thing is always a poor choice for ARM Arduino.  Even RTOSes like ChibiOS are are not right for most users.  A real OS that is more like LynxOS may be a better choice.  I guess I will reexamine the options to see what is available if I write AVR off.

To paraphrase a well known quote, you can put object-oriented lipstick on a coop scheduler but it still is a coop scheduler.

Go Up