SCoop - multitask and Simple COOPerative scheduler AVR & ARM

Your example of counting is silly but to continue being silly, here is my example of counting.

just setup 3 tasks doing a single 32 bits count++ in a loop containing a yield

Why would anyone put a yield in a loop with a preemptive RTOS? That's what round robin is for.

With proper design ChibiOS does over seven million increments, seven times better than SCoop.

Here is the sketch with five tasks:

// Simple counting demo
#include <ChibiOS_AVR.h>

volatile uint32_t n1 = 0;
volatile uint32_t n2 = 0;
volatile uint32_t n3 = 0;
volatile uint32_t n4 = 0;
//------------------------------------------------------------------------------
static WORKING_AREA(waThread1, 16);
static msg_t Thread1(void *arg) {
  while (1) {
    n1++;
  }
  return 0;
}
//------------------------------------------------------------------------------
static WORKING_AREA(waThread2, 16);
static msg_t Thread2(void *arg) {
  while (1) {
    n2++;
  }
  return 0;
}
//------------------------------------------------------------------------------
static WORKING_AREA(waThread3, 16);
static msg_t Thread3(void *arg) {
  while (1) {
    n3++;
  }
  return 0;
}
//------------------------------------------------------------------------------
static WORKING_AREA(waThread4, 16);
static msg_t Thread4(void *arg) {
  while (1) {
    n4++;
  }
  return 0;
}
//------------------------------------------------------------------------------
void setup() {
  Serial.begin(9600);
  // initialize ChibiOS
  chBegin(chSetup);
}
//------------------------------------------------------------------------------
void chSetup() {  
  chThdCreateStatic(waThread1, sizeof(waThread1), NORMALPRIO, Thread1, NULL);
  chThdCreateStatic(waThread2, sizeof(waThread2), NORMALPRIO, Thread2, NULL);
  chThdCreateStatic(waThread3, sizeof(waThread3), NORMALPRIO, Thread3, NULL);
  chThdCreateStatic(waThread4, sizeof(waThread4), NORMALPRIO, Thread4, NULL);
}
//------------------------------------------------------------------------------
// idle loop runs at NORMALPRIO
void loop() {
  uint32_t s1, s2, s3, s4;
  chThdSleepMilliseconds(10000);
  uint32_t t = millis();
  noInterrupts();
  s1 = n1;
  s2 = n2;
  s3 = n3;
  s4 = n4;
  interrupts();
  Serial.print("millis: ");
  Serial.println(t);
  Serial.print(s1+s2+s3+s4);
  Serial.write('=');
  Serial.print(s1);
  Serial.print('+');
  Serial.print(s2);
  Serial.print('+');
  Serial.print(s3);
  Serial.print('+');
  Serial.println(s4);
  while(1);
}

It prints:

millis: 10055
7180551=1798316+1798959+1798966+1784310

I am sure SCoop will be great for most Arduino users.

Still, I could give you endless examples that are not "topmost performances" where an RTOS is a better solution than SCoop.

RTOSes are now in billions of common products. Every embedded software tools company provides a RTOS.

Here is Keil's RTX offering Keil RTX5.