Try FreeRTOS - compare with ChibiOS/RT

I have posted a port of FreeRTOS here Google Code Archive - Long-term storage for Google Code Project Hosting.. The file is FreeRTOS20111031.zip.

There is also a port of ChibiOS/RT posted as ChibiOS20111027.zip. See http://arduino.cc/forum/index.php/topic,76932.0.html.

I have attempted to use the same options in both ports.

I used timer 0 for the system timer so the OS tick time is 1024 usec or 976 Hz.

I provided the same five examples in each port.

There are five examples in the FreeRTOS/examples folder.

frBlinkPrint - A simple example of three threads. A high priority thread
blinks an LED, a medium priority thread prints a counter
every second, and the low priority idle loop increments the
counter.

frContextTime - Test to determine context switch time with a semaphore.

frFastLogger - Data logger optimized for 328 Arduino. Logs two analog pins
every two ticks (1024 usec tick).

frJitter - Test of jitter in sleep for one tick.

frMegaLogger - Data logger optimized for Mega. Logs four analog pins
every tick (976 samples per sec, four values per sample)

I have posted a port of FreeRTOS ...
There is also a port of ChibiOS/RT

So what are YOUR observations?
I'm looking at those new ARM chips (and similar) with 1M of flash, and also concluding that it doesn't make sense to start working on them without some sort of kernel underneath. If they were "real" computers (with MMUs), some form of linux would be an obvious choice. These "in between" chips (too big for a monolith, too small for linux) are a puzzle. "Free" seems pretty important.

The logical conclusion is that the 2560 on the MEGA is in a similar situation. A single 256k program is ... not desirable (though I'll concede that there are some smaller efforts with lots of data that could make use of that memory...)

Are there non-real-time alternatives as well?

I didn't comment much on FreeRTOS because I like to do things that are fast and there is this restriction:

FreeRTOS may not be used for any competitive or comparative purpose, including the publication of any form of run time or compile time metric, without the express permission of Real Time Engineers Ltd. (this is the norm within the industry and is intended to ensure information accuracy).

FreeRTOS is popular, runs on lots of chips/boards and there are books and support. For several reasons it is not my favorite of systems I have tried. I don't like the fact is only uses dynamic memory.

Be sure to look at the semaphore context switch examples on a scope. There is an amazing difference.

I like ChibiOS/RT, it is extremely fast, small and uses RAM very efficiently. The author of ChibiOS/RT, Giovanni Di Sirio, has a design/coding style that I like.

A much better recommendation is here http://www.yagarto.de/projects/chibios/index.html. Michael Fischer is a true expert in ARM. I use his YAGARTO tool-chain on ARM. He discusses six popular RTOSes in this link.

Also here http://www.yagarto.de/projects/rtoscomp/index.html.

We did have a discussion thread on RTOSs a month back, and the conclusion was they were a total waste of time especially in the arduino context.
They just suck the power out of a processor and move the debugging process into the fighting the RTOS region.

I have no interest in discussion. Did anyone do a serious application that is difficult without a RTOS.

The Chibios/RT Mega data logging example has a 15 usec overhead for the context switch to the ADC task. It takes 110 usec for each analogRead so the overhead is small.

The logger example can read four analog pins every 1.024 milliseconds, format them as CSV text, and write the data to an SD without missing points.

SD cards can have occasional write delays of 200 milliseconds so there can be well over 100 context switches to the ADC task while a block is being written to the SD.

I have done special timer based sketches to do loggers like this but ChibiOS makes it easy. Context save/restore can be faster in a custom solution but that speed is not required when using analogRead().

This example can log data about 100 times faster than a simple loop that does four analogRead() calls and then writes the data to the SD using print.

In this example a context switch happens every 1024 usec and takes 15 usec so the overhead is about 1.5%.

I have no interest in discussion.

OK bye bye :stuck_out_tongue:

Sorry, by discussion, I meant without facts. Too much of the stuff in forums is just opinion.

I believe in the old saw sometimes attributed to Admiral Grace Hopper.

“One accurate measurement is worth a thousand expert opinions” - Adm Grace Murray Hopper ( Dec 9 1906 to Jan 1 1992)

She also coined the term "bug".

Here are a few more of her quotes Quotes from Grace Hopper, Computer Programming Pioneer. Not bad for someone who started programming in 1944.

Sorry, by discussion, I meant without facts.

Well I don't know what you mean by facts but I have been in electronics / micro controllers professionally for over 40 years. I have as a manager had to run projects both with and without an RTOS so I have seen what actually happens.
Normally it is the software engineers initially push to use an RTOS with the expectation of an easy life. Very rarely does it happen like that because the problems encountered with the project then appear to be related to exactly how the RTOS is working and how to get it to do what they want. In the mean time the extra resource in memory and CPU cycles means you have to go to at least the next processor up as the power and capacity is drained by the RTOS.
In the end the project works although about four times the effort has gone in to the project and the hardware is more expensive.
Not using an RTOS on the other hand forces the programmer into actually thinking and planning what they want to do, something that is not always popular with them. However I have found that non RTOS projects come in faster and have a less demanding CPU requirement. The actual system is often more responsive as well.

Of course some projects are so large and have many programmers working on them that an RTOS is almost mandatory. However for small projects I have found them to be totally useless.

Your specific question:-

Did anyone do a serious application that is difficult without a RTOS.

Yes I have managed and contributed to these. The largest project was an Enterprise scale access control system with up to 256 nodes spanning buildings covering an area of over three square miles. I think that ticks all your boxes for that one.

The logger example can read four analog pins every 1.024 milliseconds, format them as CSV text, and write the data to an SD without missing points.

The point is that what ever you can do with an RTOS you can do without it, usually much faster.

In the mean time the extra resource in memory and CPU cycles means you have to go to at least the next processor up

Ah, but we already went to the "next processor up" when the m328 became standard. And then the MEGA, and the MEGA2560, and now we're looking at seeing the next stage beyond that.

The largest project was an Enterprise scale access control system

About how much object code? When you say you didn't use an RTOS, does that mean no multitasking at all, or just no OTS well known RTOS? There's so much of "modern computing" that is deeply dependent on "tasks" that I wouldn't want to try to do it without some OS. Most of Networking, for instance. cisco routers ran an OS back in the day when they had 512K of memory; I can't imagine having implemented what they did without the OS. And an STM32F4 eval board is a candidate for an Arduino environment, and has MORE memory than that...

non RTOS projects come in faster and have a less demanding CPU requirement.

But we rapidly approach an era where memory and CPU cycles are SO cheap... An Arduino Uno is overpowered for many of the tasks it is put to. The hypothetical Arduino Due is ridiculous.

Ah, but we already went to the "next processor up" when the m328 became standard.

Well we went up to another pin compatible part with the same CPU horse power. The problem on a commercial project is when this happens you have to scrap all the existing prototypes and try and claw them back.

About how much object code?

It was just over 100K when compiled.

When you say you didn't use an RTOS, does that mean no multitasking at all,

You don't need an RTOS to do multitasking, you just need to design your code correctly.

But we rapidly approach an era where memory and CPU cycles are SO cheap.

That is an argument that I first came across 30 years ago, it is as true now as it was then. :wink:

The hypothetical Arduino Due is ridiculous.

So lets cripple it with a RTOS?

Grumpy_Mike,

Please share some facts with us. Please provide numbers from any measurements that prove these statements.

In the end the project works although about four times the effort has gone in to the project and the hardware is more expensive.

Which processor and RTOS/version does does the above refer to? Why did you allow a RTOS to used? Didn't you do prototyping as part of the decision process?

They just suck the power out of a processor and move the debugging process into the fighting the RTOS region.

What percent of cycle go to the RTOS and where in the RTOS? Which processor and RTOS does this refer to? What is the nature of the application that causes so much overhead in the RTOS?

However for small projects I have found them to be totally useless.

How do you decide when a RTOS is useless?

You don't need an RTOS to do multitasking, you just need to design your code correctly.

What techniques do you use that can't be done by a RTOS? Which RTOSes did you benchmark to prove that roll you own was required?

So lets cripple it with a RTOS?

The processor in the Due was designed for use with a RTOS. Why do you think it will be crippled?

Well, on Mike's side, I've seen RTOS vendors recommend things like using a UART ISR to merely send a message to wake up the actual driver, which would then do the actual reading of the data from the uart. This can make sense from a "strict real time" point of view, and might be OK for an OS with a complex tty driver, but for the typical Arduino-class serial driver it would be insane (IMNSHO.) If you use an RTOS, you can spend a lot of time making this sort of decision, and bending code in "unnatural" and ultimately inefficient ways, just to fit the capabilities and style of the RTOS in question. It all ... depends.

It's sorta like C++ and Java programming for the desktop being more about learning the existence and capabilities of a multitude of classes and libraries than learning the language itself. (Sometimes I worry that Arduino is also headed that way.)

In what way is ARM CM3 "designed for use with an RTOS" ? I'd rate most RISC microcontrollers, with their large register sets and relative small amounts of RAM, as being generally poor choices for RTOS or even multitasking. Too much context and not enough place to save it...

westfw,

Here is an example that shows why the ARM CM3 architecture is very RTOS friendly. The example is a bit long. I will use the internals of ChibiOS to illustrate.

The context for the ARM Cortex M3 is divided into two parts. ChibiOS refers to the first part as the Interrupt saved context. This part of stack frame is eight 32-bit words and is saved by the NVIC, Nested Vectored Interrupt Controller, when an interrupt occurs.

Here is the structure in ChibiOS:

struct extctx {
  regarm_t      r0;
  regarm_t      r1;
  regarm_t      r2;
  regarm_t      r3;
  regarm_t      r12;
  regarm_t      lr_thd;
  regarm_t      pc;
  regarm_t      xpsr;
};

The remainder of the context can be saved by a single instruction if a context switch is required. Here is the structure for that part of the stack frame in ChibiOS:

struct intctx {
  regarm_t      r4;
  regarm_t      r5;
  regarm_t      r6;
  regarm_t      r7;
  regarm_t      r8;
  regarm_t      r9;
  regarm_t      r10;
  regarm_t      r11;
  regarm_t      lr;
};

Here is the context switch function that is called by ChibiOS at the ISR level if a context switch is required.

#define PUSH_CONTEXT() {                                                    \
  asm volatile ("push    {r4, r5, r6, r7, r8, r9, r10, r11, lr}"            \
                : : : "memory");                                            \
}
#define POP_CONTEXT() {                                                     \
  asm volatile ("pop     {r4, r5, r6, r7, r8, r9, r10, r11, pc}"            \
                : : : "memory");                                            \
}
__attribute__((naked))
void port_switch(Thread *ntp, Thread *otp) {
  PUSH_CONTEXT();
  asm volatile ("str     sp, [%1, #12]                          \n\t"
                "ldr     sp, [%0, #12]" : : "r" (ntp), "r" (otp));
  POP_CONTEXT();
}

When you write an ISR you decide if an event has happened that requires a task to be triggered. In a USART handler this could be an end of message character. If you have not received this character there is no overhead.

If you receive the end of message character, you call a ChibiOS routine and it checks if a context switch is required and does the above stack switch if necessary. When you return from interrupt the rest of the context switch will occur in the NVIC.

On a 72 MHz STM32 the overhead for this call with a context switch is about 2 microseconds. Most of this time is in determining which task should run, not the context switch.

I have made this as simple as possible so I am sure you will have a "but what if?".

..I've seen on a different forum (avr, arm, pic etc.) an interesting thread - a guy is asking for a help as he need a debugger (for STM32VL) which steps similarly as on AVR at asm instructions level. The other guy is answering him that the STM32 is a different league and nobody needs that actually.. and if the application is seen demanding he has to take simply a STM32 with a higher clock frequency..
An interesting point. I think the debate on rtos vs. superloop is similar - it is about productivity..

pito,

There are lots of ways to solve a multitasking real time problem. The solution must satisfy this condition:

Real-time means it reacts in a timely fashion, i.e. fast enough for the task at hand.

But this must also be true of the tools you use.

You can however only use it productively and safely if you understand how it works and what the limitations and pitfalls are.

Super-loops are simple so they are great for most programmers if they satisfy the real-time requirement.

Preemptive RTOSs have pitfalls that trip up the inexperienced and are probably not the best choice for most hobbyists. This may not be for you if you don't know what priority inversion is or the difference between a semaphore and a mutex.

Here is a nice article about multitasking alternatives SPLat Controls - Embedded microcontroller RTOS cooperative and preemptive multitasking in electronic controls.

.. chibios - 390ns context switch with STM32F4 Discokit..

chibios - 390ns context switch with STM32F4 Discokit..

Counting exactly what? fat16lib's quoted code shows 18 words of context saved and restored (including the stack pointer swap and the registers stacked by the NVIC.) If the RAM is single-cycle 80MHz memory, that should take at least 450ns...

The STM32 F4 used in this test has a 168 MHz clock and

RAM memory is accessed (read/write) at CPU clock speed with 0 wait states.

Push and pop run in 1 + N cycles where N is the number of registers.

The NVIC takes 12 cycles to respond or return from an interrupt.

The actual register swap takes a lot less than 390 ns.

168 MHz

oops. I missed that part, thinking it was still in the 80MHz class. Never mind. :frowning:
(in that case, 390ns sounds about right...)
(damn; I went searching through the datasheet to see if RAM was wider than 32bits or other optimization, and I forgot to check the clock rate!)

:slight_smile: it depends on the compiler and its setting, however. I did with an another compiler and get slower 440ns (11.4, 2.268Meg/sec) context switch, but higher 11.11 (semaphores..) 3.01Megsec.