Slotted, non-blocking delays against millis()

Hi, i will try to make a "short" introduction first and I will try to explain what is all about (my native language is Romanian).

My journey in the microcontroller world started with PICAXE08M microcontroller, thanks to an australian friend and, obviously, I migrated later to true Microchip microcontrollers, where I learned to program them using the JAL language. We have there, inside jallib library collection, a library named timer0_isr_interval where you can define your own non-blocking delays.

Some how, is working like millis() function but in a different way. First, you must set the size of an array corresponding to the number of total non-blocking delays you need in your application. Then, in every cell of that array, you put a number representing the required delay in milliseconds. An interrupt will trigger at every millisecond and instead of a variable increment as in millis, it will decrement the values of every cell until those will equal zero. Periodically, in your program, you will check if your delay has become zero and that means that that delay has just expired and you do the task assigned to that delay. Then, you set the delay again at the initial value.

It is useful in simulating multitasking.

After a while, I decided that is the time to see what are those AVRs, what are they doing, and if their parents knows that! Obviously, as a good PIC programmer, I bricked my first ATmega32 and returned for awhile back to PICs. But I don't like to lose battles so I tried again with an ATmega644P (util I will make an AVR Doctor, my ATmega32 remains bricked). This time I was careful and I succeeded in transforming it into a Sanguino microcontroller. I started exploring Wiring language and I can recognize the educational value and the project completion speed it offers. But i don't like how it waste the microcontroller resources. Because of this, I moved to avr-gcc toolchain and a big help was the project of Mike, libarduino, hosted on googlecode.com.

I don't know yet too much about the interrupts and timers of AVRs but I succeeded in porting the slotted, non-blocking delays from JAL to avr-gcc looking at the millis() example. I decided to compare them and for that I took a project theme where you must simultaneously blink 3 LEDs at different speeds: 250, 500 and 750 milliseconds. So here are the two avr-gcc C examples:

Using millis():

#ifndef F_CPU
#define F_CPU 16000000U // required by Atmel Studio 6
#endif

#include <avr/io.h>
#include <avr/interrupt.h>
//#include <util/delay.h>

#define sbi(PORT, BIT) (_SFR_BYTE(PORT) |= _BV(BIT)) // set bit
#define tbi(PORT, BIT) (_SFR_BYTE(PORT) ^= _BV(BIT)) // toggle bit

#define FALSE 0
#define TRUE  1

#define clockCyclesPerMicrosecond() ( F_CPU / 1000000L )
#define clockCyclesToMicroseconds(a) ( (a) / clockCyclesPerMicrosecond() )
#define microsecondsToClockCycles(a) ( (a) * clockCyclesPerMicrosecond() )
// the prescaler is set so that timer0 ticks every 64 clock cycles, and the
// the overflow handler is called every 256 ticks.
#define MICROSECONDS_PER_TIMER0_OVERFLOW (clockCyclesToMicroseconds(64 * 256))

// the whole number of milliseconds per timer0 overflow
#define MILLIS_INC (MICROSECONDS_PER_TIMER0_OVERFLOW / 1000)

// the fractional number of milliseconds per timer0 overflow. we shift right
// by three to fit these numbers into a byte. (for the clock speeds we care
// about - 8 and 16 MHz - this doesn't lose precision.)
#define FRACT_INC ((MICROSECONDS_PER_TIMER0_OVERFLOW % 1000) >> 3)
#define FRACT_MAX (1000 >> 3)

volatile uint32_t timer0_overflow_count = 0;
volatile uint32_t timer0_millis = 0;
static uint8_t timer0_fract = 0;

//#if defined(__AVR_ATtiny24__) || defined(__AVR_ATtiny44__) || defined(__AVR_ATtiny84__)
//ISR(TIM0_OVF_vect)
//#else
ISR(TIMER0_OVF_vect)
//#endif
{
	// copy these to local variables so they can be stored in registers
	// (volatile variables must be read from memory on every access)
	uint32_t m = timer0_millis;
	uint8_t f = timer0_fract;

	m += MILLIS_INC;
	f += FRACT_INC;
	if (f >= FRACT_MAX) {
		f -= FRACT_MAX;
		m += 1;
	}

	timer0_fract = f;
	timer0_millis = m;
	timer0_overflow_count++;
}

void millis_init(void) {
#if defined(TCCR0A) && defined(WGM01)
	sbi(TCCR0A, WGM01);
	sbi(TCCR0A, WGM00);
#endif
	// set timer 0 prescale factor to 64
#if defined(__AVR_ATmega128__)
	// CPU specific: different values for the ATmega128
	sbi(TCCR0, CS02);
#elif defined(TCCR0) && defined(CS01) && defined(CS00)
	// this combination is for the standard atmega8
	sbi(TCCR0, CS01);
	sbi(TCCR0, CS00);
#elif defined(TCCR0B) && defined(CS01) && defined(CS00)
	// this combination is for the standard 168/328/1280/2560
	sbi(TCCR0B, CS01);
	sbi(TCCR0B, CS00);
#elif defined(TCCR0A) && defined(CS01) && defined(CS00)
	// this combination is for the __AVR_ATmega645__ series
	sbi(TCCR0A, CS01);
	sbi(TCCR0A, CS00);
#else
#error Timer 0 prescale factor 64 not set correctly
#endif
	// enable timer 0 overflow interrupt
#if defined(TIMSK) && defined(TOIE0)
	sbi(TIMSK, TOIE0);
#elif defined(TIMSK0) && defined(TOIE0)
	sbi(TIMSK0, TOIE0);
#else
#error	Timer 0 overflow interrupt not set correctly
#endif
}

uint32_t millis()
{
	uint32_t m;
	uint8_t oldSREG = SREG;

	// disable interrupts while we read timer0_millis or we might get an
	// inconsistent value (e.g. in the middle of a write to timer0_millis)
	cli();
	m = timer0_millis;
	SREG = oldSREG;
	return m;
}

uint32_t currentMillis, pLED1Millis = 0, pLED2Millis = 0, pLED3Millis = 0;

void main(void) __attribute__((noreturn));
void main(void) {
	sbi(DDRD, 5);
	sbi(DDRC, 6);
	sbi(DDRC, 7);
	millis_init();
	sei();
	while (1) {
		//
		currentMillis = millis();
		if (currentMillis - pLED1Millis > 250) {
			pLED1Millis = currentMillis;
			tbi(PORTD, 5);
		}
		if (currentMillis - pLED2Millis > 500) {
			pLED2Millis = currentMillis;
			tbi(PORTC, 6);
		}
		if (currentMillis - pLED3Millis > 750) {
			pLED3Millis = currentMillis;
			tbi(PORTC, 7);
		}
	}
}

Using slotted, non-blocking delays:

#ifndef F_CPU
#define F_CPU 16000000U // required by Atmel Studio 6
#endif

#include <avr/io.h>
#include <avr/interrupt.h>

#define sbi(PORT, BIT) (_SFR_BYTE(PORT) |= _BV(BIT)) // set bit
#define tbi(PORT, BIT) (_SFR_BYTE(PORT) ^= _BV(BIT)) // toggle bit

#define FALSE 0
#define TRUE  1

#define DELAY_SLOTS  3 // the number of non-blocking delays we need
int16_t isr_countdowns[DELAY_SLOTS]; // the array where the "delays" are stored

#define clockCyclesPerMicrosecond() ( F_CPU / 1000000L )
#define clockCyclesToMicroseconds(a) ( (a) / clockCyclesPerMicrosecond() )
#define microsecondsToClockCycles(a) ( (a) * clockCyclesPerMicrosecond() )
// the prescaler is set so that timer0 ticks every 64 clock cycles, and the
// the overflow handler is called every 256 ticks.
#define MICROSECONDS_PER_TIMER0_OVERFLOW (clockCyclesToMicroseconds(64 * 256))

// the whole number of milliseconds per timer0 overflow
//#define MILLIS_INC (MICROSECONDS_PER_TIMER0_OVERFLOW / 1000)

// the fractional number of milliseconds per timer0 overflow. we shift right
// by three to fit these numbers into a byte. (for the clock speeds we care
// about - 8 and 16 MHz - this doesn't lose precision.)
#define FRACT_INC ((MICROSECONDS_PER_TIMER0_OVERFLOW % 1000) >> 3)
#define FRACT_MAX (1000 >> 3)

volatile uint32_t timer0_overflow_count = 0;
//volatile uint32_t timer0_millis = 0;
static uint8_t timer0_fract = 0;

#if defined(__AVR_ATtiny24__) || defined(__AVR_ATtiny44__) || defined(__AVR_ATtiny84__)
ISR(TIM0_OVF_vect)
#else
ISR(TIMER0_OVF_vect)
#endif
{
	// copy these to local variables so they can be stored in registers
	// (volatile variables must be read from memory on every access)
	//uint32_t m = timer0_millis;
	uint8_t f = timer0_fract, i, j;
	i = 1;
	//m += MILLIS_INC;
	f += FRACT_INC;
	if (f >= FRACT_MAX) {
		f -= FRACT_MAX;
		//m += 1;
		i = 2;
	}

	timer0_fract = f;
	//timer0_millis = m;
	timer0_overflow_count++;
	for (j=0; j<DELAY_SLOTS;j++){
		if(isr_countdowns[j] > 0){
			//
			isr_countdowns[j] = isr_countdowns[j] - i;
		}
	}
}

void timer0_isr_init(void) {
#if defined(TCCR0A) && defined(WGM01)
	sbi(TCCR0A, WGM01);
	sbi(TCCR0A, WGM00);
#endif
	// set timer 0 prescale factor to 64
#if defined(__AVR_ATmega128__)
	// CPU specific: different values for the ATmega128
	sbi(TCCR0, CS02);
#elif defined(TCCR0) && defined(CS01) && defined(CS00)
	// this combination is for the standard atmega8
	sbi(TCCR0, CS01);
	sbi(TCCR0, CS00);
#elif defined(TCCR0B) && defined(CS01) && defined(CS00)
	// this combination is for the standard 168/328/1280/2560
	sbi(TCCR0B, CS01);
	sbi(TCCR0B, CS00);
#elif defined(TCCR0A) && defined(CS01) && defined(CS00)
	// this combination is for the __AVR_ATmega645__ series
	sbi(TCCR0A, CS01);
	sbi(TCCR0A, CS00);
#else
#error Timer 0 prescale factor 64 not set correctly
#endif
	// enable timer 0 overflow interrupt
#if defined(TIMSK) && defined(TOIE0)
	sbi(TIMSK, TOIE0);
#elif defined(TIMSK0) && defined(TOIE0)
	sbi(TIMSK0, TOIE0);
#else
#error	Timer 0 overflow interrupt not set correctly
#endif
}

// check if the time in a specific slot expired
uint8_t check_delay(uint8_t slot)
{
	if (slot >= DELAY_SLOTS) return TRUE; //protection against user input error
	uint8_t oldSREG = SREG;
	cli();
	if (isr_countdowns[slot]<=0){
		SREG = oldSREG;
		return TRUE;
	}
	SREG = oldSREG;
	return FALSE;
}

//set the duration of a delay (in milliseconds) in a specific slot
void set_delay(uint8_t slot, uint16_t ms_time){
	if(slot >= DELAY_SLOTS) return; //protection against user input error
	uint8_t oldSREG = SREG;

	cli();
	isr_countdowns[slot] = ms_time;
	SREG = oldSREG;
}

void main(void) __attribute__((noreturn));
void main(void) {
	// setting (LED) pins as outputs
	sbi(DDRD, 5);
	sbi(DDRC, 6);
	sbi(DDRC, 7);
	// init the non_blocking delays
	timer0_isr_init();
	sei();
	// set delays in milliseconds for every LED
	set_delay(0, 250);
	set_delay(1, 500);
	set_delay(2, 750);
	while(1){
		if(check_delay(0)){
			tbi(PORTD, 5);
			set_delay(0, 250);
		}
		if(check_delay(1)){
			tbi(PORTC, 6);
			set_delay(1, 500);
		}
		if(check_delay(2)){
			tbi(PORTC, 7);
			set_delay(2, 750);
		}
	}
}

Both programs do the same. The code-size results, using the latest ATMEL toolchain, are as follows (I think that the majority of JAL users are maniacs regarding their code size, me included):

Using millis():

Device: atmega644p

Program:     606 bytes (0.9% Full)
(.text + .data + .bootloader)

Data:         25 bytes (0.6% Full)
(.data + .bss + .noinit)

Using slotted, non-blocking delays:

Device: atmega644p

Program:     516 bytes (0.8% Full)
(.text + .data + .bootloader)

Data:         11 bytes (0.3% Full)
(.data + .bss + .noinit)

But i don't like how it waste the microcontroller resources.

There's one main reason for this when it comes to the Arduino:

Portability.

Most of the waste of microcontroller resources is there because the Arduino system was designed so that you could (in general) write a single program, and expect it to work the same across a variety of different microcontrollers in the same "family" - without needing to re-write the code. At least, that's the idea; it probably has problems in practice depending on what pins you are using, which microcontroller you are using, and what the code was originally written for. I would imagine though, that for the most part, it works better than the alternative (which would necessitate a re-write and a higher knowledge of the hardware than what the userbase the Arduino is aimed at would have).

Fortunately, you found a way around this (so many pooh-pooh the Arduino as being a toy, when in fact it isn't and can be used with a regular AVR development toolchain if you are willing to put in the work of setting it up). Just don't expect your code to be very portable (the code you presented may or may not be - I don't know this for certain - but in general, when not using the Arduino libraries, you have to be more careful and mindful to make your code portable).

Well, there are solutions regarding to portability. I mean, you can use a C library designed to support many more microcontrollers, as is ASF from Atmel. I forked libarduino project of Mike and I can use same code (not the one I presented, which is made for Sanguino, without portability in mind) on many ATmega devices and some Arduino boards: http://atmega-clib.googlecode.com. Is possible and Arduino team should consider preparing the people to advance on the next step (the users are evolving anyway). libarduino is a good example. Unfortunately, no updates from a long time.

But again, nothing against Wiring language and what Arduino means.

P.S. No problem for me to make the presented code portable... just inserting some #if 's . millis() is already written by the Arduino team with portability in mind, and slotted, non-blocking delays is based on millis() code - already portable. The non-portable part of the code was the "pinout" - there comes the conditional compiling...

But i don't like how it waste the microcontroller resources.

The compiler itself usually does an excellent job of generating tight code, and the linker throws away almost everything that isn't actually used.

However it is true that the default libraries, being nice and general, generate more code than they might. However you don't have to use them. You can still make a small program inside the Arduino IDE. For example:

int main ()  { }

Program size (compiled for Uno):

Binary sketch size: 176 bytes (of a 32,256 byte maximum)

In any case it's only a waste if you need the program space for something else.

138 bytes if the IDE is using avr-gcc 4.7.3 as is on Linux Fedora19 (I told you, I am maniac) :smiley:

In any case it's only a waste if you need the program space for something else.

True.

Looks like most of the extra bytes are to do with initializing instances of classes. You would probably find that with a larger project the differences would be minimal.

Excepting classes, printf function is another culprit in increasing the code size - alot. Some time ago I did another test between Arduino, Wiring and ATmegaCLib C library, and I avoided to use printf function, compiling an application with PCF8583.

In the following image, we have Arduino on the left, Wiring on the right, and Eclipse in background. For both Arduino and Eclipse I used gcc 4.7 and Wiring came with 4.3.5 version. You can see the resulted code size on each window.

This kind of analysis helps you chose the right tools, libraries, methods when you shrinkify your Arduino projects or go commercial.

comparison-wiring-language-and-c by funlw65, on Flickr

https://code.google.com/p/atmega-clib/source/browse/trunk/ATmega_PCF8583_Serial/main.c

First thing:

From what I can see of the code the Serial handling is wrong. You check for Serial.available being greater than 0 but then do multiple reads. Only one read is safe.


Second thing:

Your code snippets appear to be doing the same idea in different ways. If the Arduino version did not use sprintf then it would be smaller too. Example:

volatile int day = 1, month = 2, year = 1998;

void setup ()
  {
  Serial.begin (115200);
  char time [20];
  sprintf (time, "%d/%d/%d", day, month, year);
  Serial.println (time);
  }  // end of setup

void loop () { }

Code size (compiled for Sanguino):

Binary sketch size: 3,910 bytes (of a 63,488 byte maximum)

Doing away with sprintf (and looking neater as well):

#include <Streaming.h>

volatile int day = 1, month = 2, year = 1998;

void setup ()
  {
  Serial.begin (115200);
  Serial << day << "/" << month << "/" << year << endl;
  }  // end of setup

void loop () { }

Code size:

Binary sketch size: 2,774 bytes (of a 63,488 byte maximum)

I saved over 1100 bytes by not using sprintf. But you can do that in the Arduino IDE.

I know, the read_write_time sketch example it comes with the Arduino Dallas library if I correctly recall... it suppose the configuration string was correctly introduced. I made my example after that, trying to reproduce it.

BTW, interesting forum software!

funlw65:
BTW, interesting forum software!

Mine? Thanks. :slight_smile:

I don't see a real advantage of the slotted delays over what is done with Arduino now.

funlw65:
Then, in every cell of that array, you put a number representing the required delay in milliseconds. An interrupt will trigger at every millisecond and instead of a variable increment as in millis, it will decrement the values of every cell until those will equal zero. Periodically, in your program, you will check if your delay has become zero and that means that that delay has just expired and you do the task assigned to that delay. Then, you set the delay again at the initial value.

It is useful in simulating multitasking.

I think GoForSmoke is right. This doesn't achieve a huge amount. For one thing, if the interrupt that fires every millisecond tries to do anything useful (like "do the task") then the task is "done" in the context of an ISR which means there are lots of things it can't do, like serial printing. Plus, interrupts are supposed to be short.

If the interrupt just sets a flag to show that a task needs to be done, you may as well just test for elapsed time in loop, what has the flag achieved?

I did a small class a while back that encapsulates that idea. It just stores a start time in a class instance and you use a function call to find how much time has elapsed between then, and now, and if enough time has elapsed you can do something. You test for this in "loop".

Example:

#include <Elapsed.h>

void setup() 
  {                
  pinMode (2, OUTPUT);     
  pinMode (3, OUTPUT);     
  }  // end of setup

static Elapsed t2, t3;

void loop() {

  if (t2.intervalMs () > 200)
    {
    digitalWrite(2, !digitalRead (2)); 
    t2.reset ();
    }

 if (t3.intervalMs () > 500)
    {
    digitalWrite(3, !digitalRead (3)); 
    t3.reset ();
    }
 
}  // end of loop

I have gone as far as checking to see if millis() has changed before doing the whole elapsed time check because of how many times execution went through loop() in a msec, which was many (10 to 20) in one sketch.
It didn't make a lot of difference per loop but hey, it let my sketch waste time even faster!

GoForSmoke:
I don't see a real advantage of the slotted delays over what is done with Arduino now.

I don't see slotted delays replacing millis() in Wiring language. I don't even know how to implement it in Wiring as a library, as it needs the array size specified by the user in a header of a library. But in a C language has his advantages over millis(). For every millis() timer (delay), you need a 32bit variable... if, lets say, we use 10 non-blocking delays (well, I don't see how that is needed in a real application but lets pretend that we need them), millis() is using even more Flash and RAM than slotted delays. Even more, at every check, you need to do a subtraction operation. In Wiring language it does not matter too much - at least, not for me, as I use it only for rapid prototyping. But in the industrial and commercial world it matters (but only if you need to move forward).

I already saw how slotted timers can work on Arduino and I don't need any library! But... I think the cleanest way would be to modify the Arduino millis() code which gives an opportunity to clean that up a bit --- there was a thread about that earlier this year.

For short intervals I can use unsigned int or byte (unsigned 8 bit) variables instead of unsigned long with either method. But if you want to time longer than 65.535 seconds then you will need 32 bit unsigned or a 16-bit and an 8-bit and more convoluted code.

And there is a small savings in cycles using the slotted method that has the down side of requiring an interrupt that may run longer than I would like and can be delayed by serial print. Consider that in addition to decrementing slot counters there needs to be a zero check first as you don't want a counter to roll back, do you?

When I wrote that I don't think there will be much savings, I did a bit of thinking first.

GoForSmoke:
...
And there is a small savings in cycles using the slotted method that has the down side of requiring an interrupt that may run longer than I would like and can be delayed by serial print.

Yeah, it may be practical for a small number of delays...

GoForSmoke:
Consider that in addition to decrementing slot counters there needs to be a zero check first as you don't want a counter to roll back, do you?

Not needed, look at the type of array and at the logic.

funlw65:

GoForSmoke:
...
And there is a small savings in cycles using the slotted method that has the down side of requiring an interrupt that may run longer than I would like and can be delayed by serial print.

Yeah, it may be practical for a small number of delays...

I have nothing against the slotted method. I just don't see any significant advantage over using count-up (what's behind the BWD method) by using count-down (what's behind the slotted method) when I take -everything- required into account.

GoForSmoke:
Consider that in addition to decrementing slot counters there needs to be a zero check first as you don't want a counter to roll back, do you?

Not needed, look at the type of array and at the logic.

Then you trade off speed for measurable duration, in this case you halve the maximum interval to 32.767 seconds. For a whole lot of things that is acceptable and you can go to 32 bit signed values and get a bit less than 24 days if need be.

Both ways have applications where they fit better so IMO it's good to know both.

The slotted method is an array-adapted version of single-timer ISR timer solutions I have seen here already so if I don't act like it's big news then please forgive me.

GoForSmoke:
...
Both ways have applications where they fit better so IMO it's good to know both.

The slotted method is an array-adapted version of single-timer ISR timer solutions I have seen here already so if I don't act like it's big news then please forgive me.

Perfectly agree with you. I didn't presented like a big news, just the joy of a personal accomplishment and, somehow is a "gypsy wagon" thing: you carry with you your own tools and knowledge to the new territories. Thanks for the discussion, code snippet and technical details!