SCoop - multitask and Simple COOPerative scheduler AVR & ARM


I m pleased to release another multi tasking alternative called SCoop for Simple Cooperative scheduler, for the Arduino and Teensy platforms AVR and ARM, using the yield() standard function.

There are couple of alternatives for doing multi tasking on arduino, including famous RTOS like chibiOS or freeRTOS, but also some light implementation like adOS published on this forum or interrupt driven library like ieOS.

I decided to create this one to bring some user friendly declaration principle and to bring some features I didnt find in other, especiall needed when doing home automation or industrial control, like we find sometime in PLC.

the Version 1 jus released this week end is available on google code here :

This pack includes a comprehensive (hopefully) 14 pages user guide and contains 3 standalone libraries.

SCoop.h provides object and macros for easy creating Tasks, Events and Timers. TimerUp/Down.h provides object class for defining unlimited time counter object with time base handling IOFilter.h provide objects for declaring and using Input, Output, time filter inputs (extended debounce style) user guide v1.0 with user and technical info and some performance measurement.

The pack has been tested on Arduino Uno, Teensy++2.0 and Teensy3.0 (beta8) with Arduino IDE v 1.02. I d be glad if someone could look at it and try it on Arduino DUE as I have not this board myself. just force the definition of MK20DX128 at the begining of SCoop.h

remark : to use it in a final project you might remove the line including "scoopdebug" and removing the SCOOPTRACE definition in the SCoop.h. at the moment it is still included, this gives the possibility to "trace" your code : just use trace("my step one in task1")

just an example:

#include "SCoop.h"

void myTask1::setup() { Serial.begin(57600); }
void myTask1::loop() { Serial.println("hello from task");sleep(1000); }

void myTask2::setup() { pinMode(13, OUTPUT); }
void myTask2::loop() { 
  Serial.println("led HIGH");digitalWrite(13, HIGH);sleep(500);
  Serial.println("led LOW");digitalWrite(13, LOW);sleep(500); }

void setup() { mySCoop.start(); }
void loop() { Serial.println("do whatever you want here also"); mySCoop.sleep(500); }

got it ? :)

A short update regarding performance and footprint in memory.

The extra code size needed for the library for the scheduler, the task wrapper and a single call to yield in the main loop() is 1700 bytes on AVR This looks cool especially considering that the library is written Object Oriented and use extensively the benefit of virtual methods. :)

on Teensy 3 (ARM) we end up surprisingly with 4900 bytes more for the library. this is not really a problem as the chip got 128K program size, but I will investigate why sometime.

RAM memory used by the library variable is not significant here but is mainly impacted by the size of the stacks allocated to tasks. Hopefully this can be sized to the very minimum by using the library method stackLeft to monitor the unused space.

regarding performance, with a recent update in the yield method (to be published wednesday), we end up with a total time of 45us for switching between 2 tasks on a Teensy++2.0 and 55us on Arduino Uno and less than 10us on Teensy 3 (arm). this time includes timing checking and storage with some calls to (un)famous millis as this library doesnt need any timer or interrupts. this is anyway good performance which provide a 5% max time lost for scheduling 4 task runing on AVR (3+main loop) and less than 1% on Teensy 3.

unless you do realtime missile cruise calculation you d be safe , right ?

here we go any body got a DUE to try this fancy lib ?? ! :grin:

Any idea why the context switch is slow?

Typical RTOS times are faster. For ChibiOS/RT giving a semaphore plus a context switch to the task that takes the semaphore is:

Uno: 15 us

Teensy 3.0: 3 us

Due: 2 us

Task context switch time are less, 11.25 us for 16 MHz AVR and 1.02 us for 72 MHz ARM CM3.

Clearly this is not a problem for a cooperative scheduler.

yes, some idea about that

in fact the time I ve given is the total overhead time needed by the scheduler to

1- verify if the task should be interupted. this includes verifying time spent in the task and comparing it to a "quantum". this garantees that all the task will get a certain amount of cpu time (unless they slep of course) 2-calculate overage time spent in this task over the 2/4/8/16 last cycles 3- check which task is next 4- switch task context to next 5- calculate overage time spent since we enter in this task (2/4/8/16 last cycles), to update the overall cycle time variable.

and then it is very ok compared to the pure assembly context switch routine which is very fast as you pointed out.

the steps 2 & 5 can now be de activated in a new version of the library (pre processing) and this will probably divide the time given by 2, but I need a oscilloscope to check this !

last but not least, the library isnt using any interrupt but systematic calls to the Arduino Millis() and this approach is very flexible but consume more cpu than traditional RTOS systick method, especially on Arduino compared to Teensy which is better optimized.

here we are

RTOSes do these thing also. A RTOS has more task states to manage and reasons for context switches so the overhead for an RTOS should be higher. The difference is that great algorithms have been invented in the last thirty years for scheduling.

RTOSes log counts and statistics and the overhead is tiny. See the reports in the above link for ChibiOS performance statistics. It isn’t done with an oscilloscope.

last but not least, the library isnt using any interrupt but systematic calls to the Arduino Millis() and this approach is very flexible but consume more cpu than traditional RTOS systick method

In what way is it more flexible or better?

I do believe that a coop scheduler is best for most Arduino users. It was good enough for the Apollo moon missions.

Here is a description of the Apollo OS.

There was a simple real-time operating system consisting of the Exec, a batch job-scheduling system that could run up to eight ‘jobs’ at a time using cooperative multi-tasking (each job had to periodically surrender control back to the Exec which then checked if there was any waiting job with higher priority). There was also an interrupt-driven component called the Waitlist which could schedule multiple timer-driven ‘tasks’. The tasks were short threads of execution which could reschedule themselves for re-execution on the Waitlist, or could kick off a longer operation by starting a ‘job’ with the Exec.

hi fat16lib,

it is true that I would have a lot to learn in context switching if the goal for SCoop was to be the best os alternative for the Arduino comunity :). Now, I feel that the result we get with SCoop is still better than what we get out of the box from other rtos for the following reason : SCoop is designed for the users, (based on my personal experience on selling software and hardware solution for industry in the 90's). it is not design for the features or for topmost performances; as an example there is no concept of priorities ! but then this is sometime better and let me do the demonstration on an example that you could reproduce easily.

just setup 3 tasks doing a single 32 bits count++ in a loop containing a yield. with 1 specific counter per task count1,2 and 3. nothing else. then in the main loop, just put another count4++ and 20 calls to yield. then use a basic test against the standard millis() to stop and print something after 10seconds. of execution. this test is not going to impact the overall timing as there are 20 yields above it, right.

just print the sum of the 4 counters. (I used only Arduino Uno for this example)

with scoop, the value is 1 086 746 , so in 10 seconds we were able to increment 32 bit counters more than 1 million time.

with chibiOS_AVR out of the box, I got 460 855 with chibiOS configured in cooperative mode with CHCONF = 0, I got 413 824

so you understand where the problem comes from : the yield function in chibiOS is switching almost imediately to next task. As each task contains a very basic sequences which probably represent 1 us, then the total time spent in the overhead scheduling is very high compared to the task themselves.

in SCoop, the default yield is more complex in this regards as it checks the total amount of time spent in the task (400us for AVR by default) before accepting a switch. result is obviously less time in switching compared to the time allocated to the task. and then we get 2 time more CPU time for the task, as the total counter reach 1 million.

so, as you pointed, I m sure that cooperative mode is a good way forward for majority of arduino community and then we need to provide them with mechanisms which are optimized for their usage, from a user standpoint.

I d love to see feedbacks from users about our libraries so that we can improve their usability, but in the meantime lets continue guiding them with our both respective approach !

Your example of counting is silly but to continue being silly, here is my example of counting.

just setup 3 tasks doing a single 32 bits count++ in a loop containing a yield

Why would anyone put a yield in a loop with a preemptive RTOS? That’s what round robin is for.

With proper design ChibiOS does over seven million increments, seven times better than SCoop.

Here is the sketch with five tasks:

// Simple counting demo
#include <ChibiOS_AVR.h>

volatile uint32_t n1 = 0;
volatile uint32_t n2 = 0;
volatile uint32_t n3 = 0;
volatile uint32_t n4 = 0;
static WORKING_AREA(waThread1, 16);
static msg_t Thread1(void *arg) {
  while (1) {
  return 0;
static WORKING_AREA(waThread2, 16);
static msg_t Thread2(void *arg) {
  while (1) {
  return 0;
static WORKING_AREA(waThread3, 16);
static msg_t Thread3(void *arg) {
  while (1) {
  return 0;
static WORKING_AREA(waThread4, 16);
static msg_t Thread4(void *arg) {
  while (1) {
  return 0;
void setup() {
  // initialize ChibiOS
void chSetup() {  
  chThdCreateStatic(waThread1, sizeof(waThread1), NORMALPRIO, Thread1, NULL);
  chThdCreateStatic(waThread2, sizeof(waThread2), NORMALPRIO, Thread2, NULL);
  chThdCreateStatic(waThread3, sizeof(waThread3), NORMALPRIO, Thread3, NULL);
  chThdCreateStatic(waThread4, sizeof(waThread4), NORMALPRIO, Thread4, NULL);
// idle loop runs at NORMALPRIO
void loop() {
  uint32_t s1, s2, s3, s4;
  uint32_t t = millis();
  s1 = n1;
  s2 = n2;
  s3 = n3;
  s4 = n4;
  Serial.print("millis: ");

It prints:

millis: 10055

I am sure SCoop will be great for most Arduino users.

Still, I could give you endless examples that are not “topmost performances” where an RTOS is a better solution than SCoop.

RTOSes are now in billions of common products. Every embedded software tools company provides a RTOS.

Here is Keil’s RTX offering

Hi fat16lib, how are you ?


Why would anyone put a yield in a loop with a preemptive RTOS? That's what round robin is for.

well, but my topic is j*ust all about cooperative mode !* and also you recently published an example in coop mode, thats why I thought it would be interesting to provide a basic comparison out of the box in coop mode. in such a case you might recognize that yield() is and must be used as much as possible to provide fluent time sharing.

With proper design ChibiOS does over seven million increments, seven times better than SCoop.

you are right, you reach 7 million counts, in preemptive mode. I cannot do such a test as I do not support this mode. but if we put a simple chThdYield() in each loop, your exmple falls to 592876 and that was and still is my point: we have to provide users with a strong framework and couple of exampels, otherwise they will miss-use the os and will get wrong result.

so at this time we could propose a deal: SCoop for cooperative and chibiOS for preemptive !

but if I was a user lloking for a simple and efficient multitasking solution, I would love to see a chibiOS-wrapper, user oriented, that would combine the best of these both worlds !

well, but my topic is just all about cooperative mode !

Sorry, I didn't know the rules.

if we put a simple chThdYield() in each loop, your exmple falls to 592876 and that was and still is my point: we have to provide users with a strong framework and couple of exampels, otherwise they will miss-use the os and will get wrong result.

But I would never do that, I would use other features of the RTOS to handle high priority requirements. But that would break your rule that we can only use features of SCoop.

Salting yield calls at various places in code to make embedded applications work is a real pain and makes the code fragile and unreliable. You get horrible interactions between unrelated tasks.

I wrote my last cooperative scheduler in 1972. I don't plan to go back there.

I'll stick with technology I helped develop thirty years ago, the predecessor of VxWorks. VxWorks is the RTOS used in many NASA projects including all JPL Mars Rovers.

Good luck with SCoop.

Edit: here is a going away challenge, implement the ChibiOS_ARM chFastLogger example with SCoop. This example logs four ADC values at a rate of 1000 Hz. Note that ChibiOS achieves a time jitter of about one microsecond between samples. To get low SNR in a signal you need this level of jitter at 1000 Hz. See any good reference on the theory of ADC signal measurements.

I ran the example with Teensy 3.0.

Here is a bit of typical data the first column is the time in micros():

micros,data0,data1,data2,data3,over 6178008,290,279,231,262,0 6179008,253,246,217,241,0 6180008,258,244,220,241,0 6181008,260,243,222,239,0 6182008,260,242,222,238,0 6183008,261,242,220,240,0 6184008,262,241,220,239,0 6185008,261,241,220,239,0 6186008,261,240,220,238,0 6187008,262,240,219,240,0 6188008,262,241,220,238,0 6189008,263,241,220,237,0 6190008,263,241,220,239,0 6191008,263,241,220,239,0 6192008,265,241,220,238,0 6193008,266,242,221,239,0 6194008,266,242,222,239,0 6195008,265,243,223,239,0 6196008,266,242,223,238,0 6197008,266,242,223,240,0 6198008,260,233,219,239,0 6199008,271,251,224,239,0 6200008,264,243,220,239,0 6201008,265,241,219,239,0 6202008,266,240,221,239,0 6203008,266,242,221,239,0 6204008,266,241,221,237,0 6205008,265,241,222,238,0 6206008,266,241,222,238,0 6207008,266,242,222,240,0


I m pleased to release a new SCoop library V1.1 XMass Pack with some updates and goodies.

  • updated user guide now 17 pages

SCoop.h :

yield() routines optimized, new object SCoopFifo for easy handling of Fifo buffers, some preprocessing parameters to optimize speed should compile for Arduino DUE but NOT TESTED ...

Timer Up & down : cosmetic changes, code moved from .h to .cpp

IOFilter : not changed

some new examples including a 1KHZ sampling with fifo, to answer fat16lib post above :) and a new performance measurement sketch...

ok, it is time for a Christmas break and I wish you all a * * * merry Christmass * * * and an happy new year ! cheers

here :

I looked at example4, it satisfied my curiosity. It was about what I expected.


also, after a good night, I have slightly update the user guide (now V1.1.1) to correct some typos or bugs in the example code... please update your file by downloading it from the google code project repository

feel free to provide feedbacks or updates to it, sure there are lot of gramar typo as I m not native english speaker :)

Guess I should have said more about example4 and SCoop in general. My request for the data logger was a test to see if you know anything about embedded tools.

Of course you couldn't solve the problem with SCoop and needed to introduce an ISR since coop schedulers just can't do deterministic scheduling of a task.

Scoop is bug ridden as soon as it is used in an environment with interrupts. Take your queue for example. Store and fetch of pointers is not atomic on AVR so you must not access a pointer from both an ISR and non-interrupt code unless you use proper methods.

Here is an example of the code for a store:

 18:   f0 93 00 00     sts     0x0000, r31
 1c:   e0 93 00 00     sts     0x0000, r30

An interrupt could happen between the store of the two bytes and the interrupt routine would fetch trash.

In addition you need to type many of your variables as volatile or the compiler will kill you as soon as something is shared in an interrupt context. This is even true in much simpler multitasking environments.

Notice that I typed counts as volatile in many of my examples. If you remove volatile from the counts in the ChibiOS example above like this:

uint32_t n1 = 0;
uint32_t n2 = 0;
uint32_t n3 = 0;
uint32_t n4 = 0;

It will print

millis: 10055 0=0+0+0+0

instead of

millis: 10055 7180551=1798316+1798959+1798966+1784310

My advice is to avoid SCoop until it is fixed.

thank you for your inestigation fat16lib;

yep I missed that :) but we know this type of bug is not the first as there was one like that in Hardware serial for long time :) anyway, I shouldnt have missed that. sorry for any inconvenience.

lets be clear : using an instance of SCoopFifo in an interupt (isr) might jeopardize the fifo methods.

I have just modified the code so that it is now atomic where needed for the AVR platform. I will publish a version 1.1.1 in a couple of hours, just the time to do regression tests... any other feedbacks or input greatly apreciated as it seems you have experienced a lot in this area

stay tuned.

Why do you need an ISR? Any simple RTOS can read an ADC in a thread at low speeds like one point every tick.

All the systems I have tested do it with low jitter, a few microseconds at the most and about one microsecond on ARM.

My view is that as soon as you need an ISR the advantages of coop schedulers vanish.

I spent my career at several large physics research labs and we gave up coop schedulers forty years ago. NASA scraped them after Apollo.

I worked for a while at CERN on LHC. CERN uses LynxOS which I didn't appreciate at first. It is very Unix/Linux like and allows scientists to do many embedded programming tasks on their own.

Here is the pitch for LynxOS:

Because the LynxOS RTOS is designed from the ground up for conformance to open system interfaces, OEMs are able to leverage existing Linux, UNIX and POSIX programming talent for embedded real-time projects. Real-time system development time is saved and programmers are able to be more productive using familiar methodologies as opposed to learning proprietary methods.

I am beginning to think the coop scheduler thing is always a poor choice for ARM Arduino. Even RTOSes like ChibiOS are are not right for most users. A real OS that is more like LynxOS may be a better choice. I guess I will reexamine the options to see what is available if I write AVR off.

To paraphrase a well known quote, you can put object-oriented lipstick on a coop scheduler but it still is a coop scheduler.

here we go:

google code updated with SCoop library V1.1.1 XMass Pack, for enhancing the SCoopFifo object with atomic code portion. this now enables using fifo in ISR like in example 4 of the pack. thanks to fat16lib for finding the bug. please download this version if you plan to use fifo in isr.

also include a new example 5 demonstrating a 500hz analog sampling with fifo logging and treatmet by a task without interrupts

time for the week end, I might do some followup as of 25/26th of december due to XMass week end with familly.

// BTW // regarding post above, linux type of OS is probably a nice alternative, especially for raspberry pi or PC on key with android for example. but for Teensy3 and Arduino DUE, the size of the program memory might not be ok, considering the large size of code generated by the ARM compiler... I still beleive that we have nice days ahead of us with Schedulers :)


I am not interested in sampling, the problem is that libraries required to log data can not be salted with enough yields() to get low jitter or worse, missed data points. You don’t need a scheduler to sample at 500 Hz, it’s trivial.

You must learn something about the theory of SNR for ADC sampling. Data is worthless if there is substantial jitter in the time between data points.

Your toy examples don’t proving anything. You must do real examples with popular Arduino libraries. That’s why I include a real data logging example.

but for Teensy3 and Arduino DUE, the size of the program memory might not be ok

I didn’t mean running a true Linux/Unix I meant one of the many small kernels that are Linux like. These kernels run on very small processors. There are an amazing number of kernels out there.

I still beleive that we have nice days ahead of us with Schedulers

Your right, there are still lots people writing coop schedulers so you are not the last diehard. People still write apps in assembler too.


thanks for your advice about jitter and snr. myself I m quite aware of these theories and their effect typically in the audio world as I have invested in a Fifo board for my oppo player, just to dejitter the digital signal :) and it makes an incredible difference. FYI here is the product and I highly recommend it : (of course it is worthless if you do not put a DAC with clean clock after it)

that said, the examples are just written to show how to use the macros and objects of the library, I will not pretend giving state of the art coding technics in the whole sampling or digital treatment area. therefore if you suggest to change the prefix names of the examples by the word "toy" , feel free to log an issue in the google code project, but I suggest you flag it low importance as i would not change it before the next release planned early 2013. 8)

I hope we can get the full benefit of our mutual experience and collaboration in the coming post, for the benefit of the community. merry Christmass to all !

Yes, digital audio is truly amazing. In another thread I cited audio ADC performance as an example. The person I replied to wrote me off. He said he was interested in “high quality ADCs”. If a 124dB, 384kHz Audio ADC isn’t impressive what is.

Here is a jitter test sketch. It only captures a counter since the test is how precise scheduling is. It does not use an ISR since many devices/sensors can not be accessed in an ISR using popular Arduino libraries. The I2C Wire library can’t be used in an ISR and I2C devices are very common in the Arduino world.

It records the results to an SD. I use my SD library but you can use the “Official Arduino SD library” it’s an ancient version of my library with a wrapper to, as you say, “make it user friendly”.

The only other library I use is ChibiOS. I did a simple fifo with two semaphores and an array. The data rate is slow, a point every 10,240 usec, since it is common for Arduino users to record things like accelerometers at about 100 Hz.

I ran the test for a number of minutes and there was no jitter between points. micros() in AVR ticks every 4 usec so that limits the accuracy of the test.

Here is calculation of what 4 usec of jitter means in this case. Here is the formula:

SNR due clock jitter:

SNR(dB) = -20log(6.28f*t)

f is the measurement frequency

t is the time jitter in seconds

For t = 4 usec and f about 98 Hz you get about 52 dB. The SNR for an ideal 10-bit ADC is about 62 dB so even this much jitter degrades the signal.

I don’t know a lot about audio but I have read about ADC clocks with jitter well below 100 femtoseconds. Wow, not nano, not pico, but femto. I guess you can lease a Galaxy FemtoSecond 77 fsec clock for $233 a month on a six year term.

Here is the sketch for ChibiOS and it is followed by a bit of the file. I look forward to your sketch for this test so I can run it and produce a file.

#include <ChibiOS_AVR.h>
#include <SdFat.h>

// interval between points in units of 1024 usec
const uint16_t intervalTicks = 10;
// SD file definitions
SdFat sd;
SdFile file;
// Fifo definitions

// size of fifo
const size_t FIFO_SIZE = 20;

// count of data records in fifo
SEMAPHORE_DECL(fifoData, 0);

// count of free buffers in fifo

// data type for fifo item
struct FifoItem_t {
  uint32_t usec;  
  int value;
  int error;
// array of data items
FifoItem_t fifoArray[FIFO_SIZE];

// head and tail index for fifo
size_t fifoHead = 0;
size_t fifoTail = 0;
// 64 byte stack beyond task switch and interrupt needs
static WORKING_AREA(waThread1, 64);

static msg_t Thread1(void *arg) {
  int error = 0;
  int count = 0;
  while (1) {
    // get a buffer
    if (chSemWaitTimeout(&fifoSpace, TIME_IMMEDIATE) != RDY_OK) {
      // fifo full indicate missed point
    FifoItem_t* p = &fifoArray[fifoHead++];
    if (fifoHead >= FIFO_SIZE) fifoHead = 0;
    p->usec = micros();
    p->value = count++;
    p->error = error;
    error = 0;
    // signal new data
  return 0;
void setup() {
  Serial.println(F("type any character to begin"));
  // open file
  if (!sd.begin() || !"DATA.CSV", O_CREAT | O_WRITE | O_TRUNC)) {
    Serial.println(F("SD problem"));
  // throw away input
  while ( >= 0);
  Serial.println(F("type any character to end"));
  // start kernel
void chSetup() {
  // start producer thead
  chThdCreateStatic(waThread1, sizeof(waThread1), NORMALPRIO + 2, Thread1, NULL);  
// time in micros of last point
uint32_t last = 0;
void loop() {
  // wait for next data point
  FifoItem_t* p = &fifoArray[fifoTail++];
  if (fifoTail >= FIFO_SIZE) fifoTail = 0;
  file.print(p->usec - last);
  last = p->usec;
  // release space
  if (Serial.available()) {

Here is the file. The first column is the time between points in micros(), the second is the counter, and the third is the number of missed points due to no fifo space.


Hey good to see the SCoop - I haven’t been able to try it out yet - but fantastic to see something published.
I’ve put a simple multi-loop tasker together myself - based on the way of doing it. However I haven’t had time to be able to test it out or publish.
I look forward to checking out SCoop.
IMHO the value of a co-operative scheduler, is to have a simple move into multi-tasking.
It seems to me the value of pre-emptive schedulers is to be able to do system level heavy lifting - especially for buffered IO process tcp/ip interrupts, and to be able to meet other hard deadlines. The cost of the pre-emptive scheduler, IMHO, is high in terms of system complexity and maintaining stability.
In a previous life with a 100 realtime software engineers, the Software VP owned the pre-emptive schedulers priorities to stop just anyone optomizing it for their functional systems. In fact most of the processes ran at the same priority, and just the few processes that were managing the hardware redundancy and I/O had higher priorities.

My experience in software has been to simplify the design and use as simple as possible primitives to implement it. For this type of design event based designing is extremely valuable . That is encouraging a discussion of what events are coming into the board and what are the outputs. So having the ability to collect user input in a buffer via an interrupt and then process/schedule a task::loop to parse the input on a terminating or ‘?’ is very simple. Very simple primitives - user input collection - and the ability to schedule a task::loop. Also supports the architecture for low power processing.
Wishing everyone a nice seasonal holiday - Cheers Neil