Nil RTOS update - Fast and Tiny!

Giovanni Di Sirio has posted an updated version of Nil RTOS and I have ported this version to AVR Arduinos.

Giovanni Di Sirio is also the author of ChibiOS/RT and his current plan is for Nil RTOS to have "API and types reorganized in order to converge Nil with ChibiOS 3.0.".

The download file is NilRTOS20130720.zip http://code.google.com/p/rtoslibs/downloads/list.

Nil RTOS is an excellent fit for Uno. Here is an example of its small size. This simple Standard Arduino sketch to blink the pin 13 LED uses 1076 bytes of flash.

/*  Simple blink for sizze comprare with Nil RTOS blink */

// The LED is attached to pin 13 on Arduino.
const uint8_t LED_PIN = 13;

//------------------------------------------------------------------------------
void setup() {
  pinMode(LED_PIN, OUTPUT);
}
//------------------------------------------------------------------------------
void loop() {
  digitalWrite(LED_PIN, HIGH);
  delay(200);
  digitalWrite(LED_PIN, LOW);
  delay(200);
}

This Nil RTOS demo sketch with the idle thread plus two higher priority threads, a semaphore, and sleep functionality takes only 2116 bytes of flash. Only 1076 bytes for a priority based preemptive RTOS.

/*
 * Example to demonstrate thread definition, semaphores, and thread sleep.
 */
#include <NilRTOS.h>

// The LED is attached to pin 13 on Arduino.
const uint8_t LED_PIN = 13;

// Declare a semaphore with an inital counter value of zero.
SEMAPHORE_DECL(sem, 0);
//------------------------------------------------------------------------------
/*
 * Thread 1, turn the LED off when signalled by thread 2.
 */
// Declare a stack with 128 bytes beyond context switch and interrupt needs.
NIL_WORKING_AREA(waThread1, 128);

// Declare the thread function for thread 1.
NIL_THREAD(Thread1, arg) {
  while (TRUE) {
    
    // Wait for signal from thread 2.
    nilSemWait(&sem);
    
    // Turn LED off.
    digitalWrite(LED_PIN, LOW);
  }
}
//------------------------------------------------------------------------------
/*
 * Thread 2, turn the LED on and signal thread 1 to turn the LED off.
 */
// Declare a stack with 128 bytes beyond context switch and interrupt needs. 
NIL_WORKING_AREA(waThread2, 128);

// Declare the thread function for thread 2.
NIL_THREAD(Thread2, arg) {

  pinMode(LED_PIN, OUTPUT);
  
  while (TRUE) {
    // Turn LED on.
    digitalWrite(LED_PIN, HIGH);
    
    // Sleep for 200 milliseconds.
    nilThdSleepMilliseconds(200);
    
    // Signal thread 1 to turn LED off.
    nilSemSignal(&sem);
    
    // Sleep for 200 milliseconds.   
    nilThdSleepMilliseconds(200);
  }
}
//------------------------------------------------------------------------------
/*
 * Threads static table, one entry per thread.  A thread's priority is
 * determined by its position in the table with highest priority first.
 * 
 * These threads start with a null argument.  A thread's name may also
 * be null to save RAM since the name is currently not used.
 */
NIL_THREADS_TABLE_BEGIN()
NIL_THREADS_TABLE_ENTRY("thread1", Thread1, NULL, waThread1, sizeof(waThread1))
NIL_THREADS_TABLE_ENTRY("thread2", Thread2, NULL, waThread2, sizeof(waThread2))
NIL_THREADS_TABLE_END()
//------------------------------------------------------------------------------
void setup() {
  // Start Nil RTOS.
  nilSysBegin();
}
//------------------------------------------------------------------------------
// Loop is the idle thread.  The idle thread must not invoke any 
// kernel primitive able to change its state to not runnable.
void loop() {
  // Not used.
}

The nilContext example demonstrates the speed of the Nil RTOS scheduler. A semaphore signal plus a thread context switch takes just over 12 microseconds on an Uno.

I wrote a ascii data-logging sketch as a more realistic example. This sketch is able to log four analog pins on an Uno at 2,000 Hz, a total of 8,000 ascii analog values/second, to an SD without dropping data points. This is very difficult to do without a small fast RTOS.

Thanx again for al your work porting Nil to Arduino.

I read in the release notes that Nil now supports a tickless kernel. Any chance that this can be ported to the AVR/Arduino environment (didn't check your port yet) with or without additional hardware support ?

Is there any hint how to measure actual ram volume a task needs?
PS: I have 5 tasks with a stack with 128 bytes beyond context switch and interrupt needs..
I get

Stack Sizes: 197 197 197 197 197 14581
Unused Stack: 183 185 99 183 99 14533

So my understanding is the difference is the actual ram requirement for particular task (1..5)..but not sure.. :cold_sweat:

The first line is the size of the stack in bytes for each thread. So 197 is the total stack available to the first thread. The last number, 14581, is the amount of heap and stack available to the idle thread.

Nil RTOS fills all stacks and the heap with 0X55 bytes. The second line gives the "high water" size of each stack by looking at how much of the 0X55 pattern remains.

The stack grows down so the low 183 bytes of the stack for the first thread still have the 0X55 pattern.

14533 bytes of the heap/idle stack that still have the 0X55 pattern.

After you run your threads in a normal way, print the unused stack size. You can reduce the size of each thread's stack so the unused stack size is small. I like to have at least 32 bytes spare.

It appears that many of your threads hardly executed.

I've been trying to decrease the stacks - see the actual usage when call them (a "CLI" task signals to few other tasks to execute some commands (set blink h/l period, analog in/out, show stack usage):

PITO>
PITO> stack
STACK
Stack Sizes: 133 133 85 85 85 85 117 14795
Unused Stack: 35 94 71 73 71 71 103 14747
PITO> period 40 40
PERIOD 40 40
PITO> aout 14 200
AOUT 14 200
PITO> ain 3
AIN 3
190
PITO> stack
STACK
Stack Sizes: 133 133 85 85 85 85 117 14795
Unused Stack: 27 39 71 73 71 71 19 14747
PITO>

Some tasks are set to 16 (85 ones).

A thread that uses very little stack is not suffering a preemptive context switch.

If a thread executes very little and then sleeps, little stack is used. The kernel just saves the top of stack position and wakes the thread by returning from the sleep call.

At some point a thread like this will use some extra stack due to preemption or an interrupt and the ISR will use the thread's stack. That's why I like to allow about 32 spare locations.

In short, you need a long, realistic execution of your system to insure you have the correct stack sizes. At least the old trick of filling stacks with known values tells you how much stack was used at the "high water mark".

@fat16lib

Had some time to have a look at this RTOS. I build RTOS/platforms/dev.environment for many-core/DSP (32-128 cores) in my day job and found this code nice and compact.

It might be time to introduce a context switching kernel layer in Cosa to tackle the next complexity level. This could also give even more structure to the device drivers. I have a lot of work to get the code thread safe but the end result would be worth that. There a too many device libraries that do not work very well together due to lack of semphores/synchronization on SPI, TWI, etc. And even basic functions such as ADC conversion in the background.

A "life-time ago" I implemented a co-routines/context switch with longjmp/setjmp. Comparing the NilRTOS context switch with the AVR code for longjmp/setjmp they are very simular. The next issue is the thread memory model and if dynamic threads should be allowed. Most design today is static thread pools (web-servers, protocol stacks, etc).

Appreciate that you push the usage of a RTOS for Arduino. It is needed for larger projects.

This is a totally different route than the event driven/object-oriented style I have introduced in Cosa. The main argument being memory requirements and initial complexity for users. But as the context can be as small as max stack depth plus processor state for a thread, given that ISRs are run on their own stack, I might need to reconsider this.

Cheers!

A small (almost Nil) documentation to the NilRtos would be nice to have.. :wink:

Have you clicked on the NilRTOS.html file after you unzipped NilRTOS20130720.zip?

Look at the the various tabs. You will find things like this:

msg_t nilSemWaitTimeout ( semaphore_t * sp,
systime_t timeout
)

Performs a wait operation on a semaphore with timeout specification.

Parameters:
[in] sp pointer to a semaphore_t structure
[in] timeout the number of ticks before the operation timeouts, the following special values are allowed:

TIME_IMMEDIATE immediate timeout.
TIME_INFINITE no timeout.

Returns:
A message specifying how the invoking thread has been released from the semaphore.

Return values:
NIL_MSG_OK if the thread has not stopped on the semaphore or the semaphore has been signaled.
NIL_MSG_RST if the semaphore has been reset using nilSemReset().
NIL_MSG_TMO if the semaphore has not been signaled or reset within the specified timeout.

Function Class:
Normal API, this function can be invoked by regular system threads but not from within a lock zone.

Definition at line 474 of file nil.c.

References nilSemWaitTimeoutS(), nilSysLock, and nilSysUnlock.

Referenced by nilTimer1Wait(), NilFIFO< Type, Size >::waitData(), NilStatsFIFO< Type, Size >::waitData(), NilFIFO< Type, Size >::waitFree(), and NilStatsFIFO< Type, Size >::waitFree().

Grrh, I have not.. :frowning:
Thanks for pointing me to that documentation! Great! :slight_smile:

Unfortunately there are a number of typos since this is the author's experimental code.

Giovanni Di Sirio warned that he is busy with ChibiOS and the Nil RTOS Doxygen documentation needs work.

@Bill: I'd like to understand the relationship:
nil.systime vs. NilTimeNow() vs. millis() vs. timer0 settings.

When changing T0 compare value in boardInit() I do not see any change in NilTimeNow() values against external time. A: There will be no change on the timing with the compare of any value as the match does not clean the T0, but it runs over until it overflows..

Millis() shows proper time when called from within a NilRtos task (because of compensation within its OVF_vect isr).

NilTimeNow() is slower - shows lower values against millis() (because of 1.024??).

So I see no effect of boardInit() settings on Timer0. How the whole stuff around T0 works actually, pls?

PS1: nilSysTimerHandlerI() uses _COMPA_vect on T0, and millis() uses _OVF_vect on T0, that means both work simultaneously in my sketch? (A: Yes) As I can see the:
a) NilTimeNow() is hooked to T0 with 1.024ms period (but not set by boardInit() )
b) millis() is hooked to T0 with 1.024ms period, but its reading is compensated in sw (so it shows correct time)

PS2: is it possible the arduino.h (wiring.c) T0 setting overrides the boardInit() setting (.lss shows yes)?

PS3: the idea is to use T0 with compare on 250 with 64 prescaler to get exact 1.000ms tick for nil.systime and friends and 16MHz crystal..
Maybe:

  1. set compare to 125 (to fire NilSysTimerHandlerI())
  2. NilSysTimerHandlerI() sets the next compare to 250 (to fire millis),
  3. and compare w/ 250 isr (millis stuff) cleans T0 and sets compare again to 125 (to fire NilSysTimerHandlerI())..
  4. or, get rid of millis stuff, use Nil stuff only, and set T0 to compare w/ 250 and clean T0
    crazy.. :slight_smile:
/*
 * Thread "PrTime", prints time in millisecs since power on
 */
// Declare a stack with 128 bytes beyond context switch and interrupt needs
NIL_WORKING_AREA(waPrTime, 128);
// Declare the thread function
NIL_THREAD(PrTime, arg) {
  	unsigned long m,n;	
	while (TRUE) {
		// Wait for signal from CLI
		nilSemWait(&prtime);
		m = millis();
		n = nilTimeNow();
		// Print time
     	            Serial.print(m);    
     	            Serial.print(" ");
     	            Serial.print(n); 	
	  	   Serial.println();
	}
}
2889567 2821839
2889567 / 2821839 = 1.0240013

with

void boardInit(void) {
  /*
   * Timer 0 setup.
   */
  OCR0A = 250;  //was 128  << it does not matter as it runs over with any value
  //TCCR0B = 0x03; // /64 << that is set by wiring.c
  TIMSK0  |= (1 << OCIE0A);  /* IRQ on compare.  */
}

@Bill: I'd like to understand the relationship:
nil.systime vs. NilTimeNow() vs. millis() vs. timer0 settings.

First, I would have preferred to not use the compare interrupt. This would be possible but it would require modifying the Arduino code for the overflow interrupt. Due has a systick hook so I just call the RTOS systick scheduling stuff from the the same ISR that maintains millis() and micros().

I thought about changing the period from 1024 micoseconds but I want both millis() and micros() to continue to work. micros() needs to tick every 4 microseconds and the timer 0 counter is the low byte of micros().

The purpose of the system tick in an RTOS is not to keep long term time. In Nil it is 16-bits so sleep and semaphore time-outs are limited to about a minute.

A big advantage of keeping the 1024 usec period is that there is little time jitter if the overflow interrupt is the only other interrupt. If you schedule the highest priority task to run every n ticks, the jitter will be very low.

There are macros in most RTOS systems to convert system time to seconds, milliseconds, and microseconds.

Nill has these:

/**
 * @name    Time conversion utilities
 * @{
 */
/**
 * @brief   Seconds to system ticks.
 * @details Converts from seconds to system ticks number.
 * @note    The result is rounded upward to the next tick boundary.
 *
 * @param[in] sec       number of seconds
 * @return              The number of ticks.
 *
 * @api
 */
#define S2ST(sec)                                                           \
  ((systime_t)((sec) * NIL_CFG_FREQUENCY))

/**
 * @brief   Milliseconds to system ticks.
 * @details Converts from milliseconds to system ticks number.
 * @note    The result is rounded upward to the next tick boundary.
 *
 * @param[in] msec      number of milliseconds
 * @return              The number of ticks.
 *
 * @api
 */
#define MS2ST(msec)                                                         \
  ((systime_t)(((((uint32_t)(msec)) * ((uint32_t)NIL_CFG_FREQUENCY) - 1UL) /\
                1000UL) + 1UL))

/**
 * @brief   Microseconds to system ticks.
 * @details Converts from microseconds to system ticks number.
 * @note    The result is rounded upward to the next tick boundary.
 *
 * @param[in] usec      number of microseconds
 * @return              The number of ticks.
 *
 * @api
 */
#define US2ST(usec)                                                         \
  ((systime_t)(((((uint32_t)(usec)) * ((uint32_t)NIL_CFG_FREQUENCY) - 1UL) /\
                1000000UL) + 1UL))

There are also macros for sleeping seconds, milliseconds, and microseconds:

/**
 * @brief   Delays the invoking thread for the specified number of seconds.
 * @note    The specified time is rounded up to a value allowed by the real
 *          system clock.
 * @note    The maximum specified value is implementation dependent.
 *
 * @param[in] sec       time in seconds, must be different from zero
 *
 * @api
 */
#define nilThdSleepSeconds(sec) nilThdSleep(S2ST(sec))

/**
 * @brief   Delays the invoking thread for the specified number of
 *          milliseconds.
 * @note    The specified time is rounded up to a value allowed by the real
 *          system clock.
 * @note    The maximum specified value is implementation dependent.
 *
 * @param[in] msec      time in milliseconds, must be different from zero
 *
 * @api
 */
#define nilThdSleepMilliseconds(msec) nilThdSleep(MS2ST(msec))

/**
 * @brief   Delays the invoking thread for the specified number of
 *          microseconds.
 * @note    The specified time is rounded up to a value allowed by the real
 *          system clock.
 * @note    The maximum specified value is implementation dependent.
 *
 * @param[in] usec      time in microseconds, must be different from zero
 *
 * @api
 */
#define nilThdSleepMicroseconds(usec) nilThdSleep(US2ST(usec))

Many RTOS systems have a service for high resolution timers. Maybe Nill will have something more based on what is done for ChibiOS.

Nil's use of system tick is pretty standard. Nil is designed for a tick in the range of 1-10 ms. I don't intend to change that, I will live with what Giovanni Di Sirio does.

So the easiest way to get a tick = 1.000ms is to use 16.384MHz crystal :slight_smile:

If you are willing to give up PWM associated with timer 0 you can have a 1.000 ms tick. In PWM mode OCR0A is double buffered and it is updated at "BOTTOM" so changing it only gives jitter in the CMPA interrupt.

See the CPU datasheet for timer 0.

If you change timer 0 to "normal" mode you can have a 1.000 ms tick.

The following assumes a 16 MHz CPU. I think the 250 added to OCR0A is (F_CPU/64000) for general CPU frequency.

Here are changes to board.c:

/** System time ISR. */
static uint8_t cmp = 0;
NIL_IRQ_HANDLER(TIMER0_COMPA_vect) {

  NIL_IRQ_PROLOGUE();
  OCR0A += 250; // ****** ADD THIS LINE FOR 1.000 MS TICK************
  nilSysTimerHandlerI();

  NIL_IRQ_EPILOGUE();
}
/**
 * Board-specific initialization code for Arduino.
 * Use timer 0 compare A to geneate an interrupt every 1024 usec.
 *
 */
void boardInit(void) {
  /*
   * Timer 0 setup.
   */
  OCR0A = 128;
  TIMSK0  |= (1 << OCIE0A);  /* IRQ on compare.  */
  TCCR0A &= ~((1 << WGM00) | (1 << WGM01)); // ****** ADD THIS LINE FOR 1.000 MS TICK************
}

Edit nilconf.h to change the tick frequency.

Change this:

/**
 * @brief   System tick frequency.
 */
#define NIL_CFG_FREQUENCY                   (F_CPU/16384L)

To this:

/**
 * @brief   System tick frequency.
 */
#define NIL_CFG_FREQUENCY                   1000

Now the NilTimeNow() and millis() are identical (I've changed the systime to uint32_t):

382938 382933
563120 563115
808810 808806
1160186 1160182
1770719 1770715
10808014 10808011

Q: what will happen when the compare value drifts closer to 255/0 (OVF isr for millis) because of "OCR0A += 250;" ?

Q: what will happen when the compare value drifts closer to 255/0 (OVF isr for millis) because of "OCR0A += 250;" ?

The timer counter, TCNT0, just increments and the overflow ISR doesn't change anything. So the compare ISR is not affected by the overflow ISR.

Adding 250 really just subtracts 6 counts or 24 microseconds until the next compare match. As long as interrupt are not disabled for around a millisecond this will be reliable.

So when compare value will be 0, the ovf and compare isr fire simultaneously, I think. Will be both isrs processed properly then?

So when compare value will be 0, the ovf and compare isr fire simultaneously, I think. Will be both isrs processed properly then?

It will work correctly but there will be a little jitter in the time. Interrupt priority on AVR is by vector location.

The interrupts have priority in accordance with their Interrupt Vector position. The lower the Interrupt Vector address, the higher the priority.

The TIMER0 COMPA interrupt is higher priority than TIMER0 OVF so the Nil systick should be processed first if both occur on the same CPU cycle.

I have several tasks, two of them handle I2C. First reads ads1110 ADC (per. every 100ms) and the second one reads an external RTC (for example each second). Running both tasks accessing single I2C bus does not work well - it comes to data corruption.
What is the proper handling of such situation under NilRtos?