Exploring millis()

I am looking for a very fast unsigned integer multiplication by 1.024
It seems 128/125 would be the solution here.
So

 unsigned long a;  //or, better, unsigned long long
 a = (a << 7);
 a = a / 5;  // this could be a fast /5 routine
 a = a / 5;
 a = a / 5;  // this is a = 1.024 * a

Any better idea here?

128/125 is good, but...

1024/1000 is better.

Bit shift by 10 (gets optimised to a shift by 2 followed by adding a register below with value 0). Call DivMod10 3 times.

Granted that would only work on max 22bit unsigned numbers.

as its integer, and its a power of 2 multiply, just 'move the point' !

or have I miss understood

drjiohnsmith: just 'move the point' !

And how do you do that? You divide by 10.

oops : Before coffee

I read it as multiply by 1024, i.e shift 10,

[quote author=Tom Carpenter link=topic=167414.msg1357724#msg1357724 date=1376855013] 128/125 is good, but...

1024/1000 is better. [/quote] Would be exactly the same as both enumerator and denominator differ a factor 8 exactly. That makes the first (128/125) preferred as it can be used for a larger range of integers.

To make it more exact I would add rounding to the formula: x = (x * 128 + 63)/125;

I’d suggest looking at 256/250:

256/250 == 256/(256-6) == 1/(1-6/256) == 1 + (6/256) + (6/256)2 + (6/256)3 + (6/256)4 + …

This can be factored as (1+6/256)(1+36/65536)(1+1296/232)*…

Which, ignoring the least significant bits for the moment, suggests code of the form:

                b+=(b*6)>>8;
                b+=(b*36)>>16;
                b+=(b*1296)>>32;

Practically for 32 bits this is:

                b+=((b>>8)*6)+(((b&255)*6)>>8);
                b+=((b>>16)*36)+(((b&65535)*36)>>16);
                b+=((b>>8)*81)>>20;                               // 81 == 1296>>4

This should be within 2 of the correct value. Getting the precise value is left as an exercise :grin:

well done! stimmer +1

Cool.. There is a calculus for converting the timer0_overflow_ticks into millis() within the timer0 OVF ISR. I've been thinking to move that calculation into millis() function, but still as I see it is a lot of calcs..

I'm trying to optimise the assembler, but I don't like all those multiplications (I want to avoid the mul instruction as its not possible on attiny's).

pito: Cool.. There is a calculus for converting the timer0_overflow_ticks into millis() within the timer0 OVF ISR. I've been thinking to move that calculation into millis() function, but still as I see it is a lot of calcs..

Which code/function/file do you refer to exactly? wiring.c?

stimmer: Practically for 32 bits this is:

                b+=((b>>8)*6)+(((b&255)*6)>>8);
                b+=((b>>16)*36)+(((b&65535)*36)>>16);
                b+=((b>>8)*81)>>20;                               // 81 == 1296>>4

Yeah, not quite the right value (out by +/- 1) as you expected.

At any rate, I have created a function which does this calculation in 141 clock cycles, which is quite a lot - mostly because it doesn't use the mul instruction. Though I haven't tested it for numbers larger than 20 bits.

@rob: yes wiring.c, there is a calc for millis()

..
m += MILLIS_INC;
    f += FRACT_INC;
    if (f >= FRACT_MAX) {
        f -= FRACT_MAX;
        m += 1;
    }
..

This does the conversion from ticks to millis, millis = 1.024*ticks.

It sounds like you want to reduce the code in the timer0 interrupt to increment only a single counter. That’s a noble goal.

One other issue to consider is Arduino programs expect millis() to have proper 32 bit rollover. If you leave out the fractional stuff and extra millis increment, a 32 bit count will roll over too soon. You’ll probably need to expand it to 40 bits, and add code inside the interrupt to check for that magical point where it needs to roll back to zero so the computed 32 bit value present to the user has proper rollover.

The math in the interrupt is only addition and subtraction, that are operators that are not very time consuming.

A first order test shows the interrupt takes about 3.3 micros (defines are calculated compile time)

// renamed the wiring.c #defines to prevent clashes

#define X (clockCyclesToMicroseconds(64 * 256))
#define XX (X / 1000)
#define YY ((X % 1000) >> 3)
#define ZZ (1000 >> 3)

volatile unsigned long a = 0;
volatile unsigned long b = 0;
static unsigned char c = 0;

void f()
{
  // copy these to local variables so they can be stored in registers
  // (volatile variables must be read from memory on every access)
  unsigned long m = b;
  unsigned char f = c;

  m += XX;
  f += YY;
  if (f >= ZZ) {
    f -= ZZ;
    m += 1;
  }

  c = f;
  b = m;
  a++;
}

void setup()
{
  Serial.begin(9600);

  uint32_t start = millis();
  for(int i=0; i<10000; i++)
  {
    f();
  }
  Serial.println((millis() - start)/10.0, 4);
}

void loop()
{
}

The current implementation keeps track of the error every millisecond, it restricts the size of the error to fit into a byte (char).

If you want to move this math to millis() you just do not know how often millis() is called. If between the calls an overflow would occur (OK small chance) , it would become impossible to reconstruct right value of millis().

OK that said, assuming millis() is called often enough one could keep adding the fractional error in the interrupt, and do additional math in millis()

volatile unsigned long timer0_overflow_count = 0;
volatile unsigned long timer0_millis = 0;
static unsigned long timer0_fract = 0;   //<<<<<<<<<<<<<<<<<<<<<< notice type changed as we keep on adding fractional parts until millis() is called

SIGNAL(TIMER0_OVF_vect)
{
	timer0_millis += MILLIS_INC;    // raw millis();
	timer0_fract += FRACT_INC;   // track the error
	timer0_overflow_count++;      // for micros()
}

unsigned long millis()
{
	unsigned long m;
	uint8_t oldSREG = SREG;

	// disable interrupts while we read timer0_millis or we might get an
	// inconsistent value (e.g. in the middle of a write to timer0_millis)
	cli();
	m = timer0_millis;
        unsigned long mf = timer0_fract / FRACT_MAX;
        timer0_fract -= mf * FRACT_MAX;
        m += mf;
        timer0_millis = m;
	SREG = oldSREG;
	return m;
}

A quick test with the shorter ISR results in 2.9 usec so 0.4 uSec less. (13% speedup)
Drawback is that the call to millis contains a huge lot extra math including a multiplication and a (expensive) division by 125.

Original call to millis() → ~1.7 uSec
“Improved” call to millis() → ~47 uSec (25-30 times slower!) and note that it does CLI() for that whole time. so damage to low level timing might be is substantial…

Another option is to calculate the missed fractional part when millis() is called. For this one should remember the previous call to millis().

volatile unsigned long timer0_overflow_count = 0;
volatile unsigned long timer0_millis = 0;
static unsigned long prev_timer0 = 0;

SIGNAL(TIMER0_OVF_vect)
{
	timer0_millis += MILLIS_INC;
	timer0_overflow_count++;
}

unsigned long millis()
{
	unsigned long m;
	uint8_t oldSREG = SREG;

	// disable interrupts while we read timer0_millis or we might get an
	// inconsistent value (e.g. in the middle of a write to timer0_millis)
	cli();
        unsigned long p = prev_timer0;
	m = timer0_millis; 
        p = (m-p) * FRACT_INC;  // fractional ticks missed... 
        m += p / FRACT_MAX;  // magic number == 125/3 yes there is a rounding error
        prev_timer0 = m;
        SREG = oldSREG;
 	return m;
}

“Improved” call to millis() → ~41 uSec better but still worse as it still is in CLI() mode

Next try is to base millis() on timer0_overflow_count only, should at least speed up the ISR() part

volatile unsigned long timer0_overflow_count = 0;
// volatile unsigned long timer0_millis = 0;
static unsigned long prev_timer0 = 0;

SIGNAL(TIMER0_OVF_vect)
{
	// timer0_millis += MILLIS_INC;
	timer0_overflow_count++;  // for micros()
}

unsigned long millis()
{
	unsigned long m;
	uint8_t oldSREG = SREG;

	// disable interrupts while we read timer0_millis or we might get an
	// inconsistent value (e.g. in the middle of a write to timer0_millis)
	cli();
        unsigned long p = prev_timer0;
	m = timer0_overflow_count * MILLIS_INC; 
        p = (m-p) * FRACT_INC;  // fractional ticks missed... 
        m += p / FRACT_MAX;  // magic number == 125/3 yes there is a rounding error
        prev_timer0 = m;
        SREG = oldSREG;
 	return m;
}

The ISR is down to 1.7 uSec (looks promising)
Millis() it self still around 41 uSec and still much of that in CLI() mode

Final try is to do the math without remembering the previous time,
and then the code suddenly becomes simpler! (however overflow gets corrupted I think - for short running sketches acceptable?)

volatile unsigned long timer0_overflow_count = 0;
// volatile unsigned long timer0_millis = 0;
// static unsigned long prev_timer0 = 0;

SIGNAL(TIMER0_OVF_vect)
{
	timer0_overflow_count++;  // for micros()
}

unsigned long millis()
{
        // update to prevent reading incorrect value
      	uint8_t oldSREG = SREG;
        cli();
	unsigned long m = timer0_overflow_count * MILLIS_INC;  
        SREG = OLDREG
        unsigned long tm = (m * FRACT_INC)/ FRACT_MAX;  // ticks missed since beginning of time 
 	return m + tm;
}

The ISR is still 1.7 uSec (good)

Millis() is still about 40 uSec BUT it does not spend much time in CLI() anymore !

So if millis() is called less than once every ~25 milliseconds we now gained clockcycli.

Note: we can simplify tm = (m * FRACT_INC)/ FRACT_MAX; to tm = m / FRACT_XXX; which is marginally faster

However as it is a fixed division we might replace it with a number of shifts and bitmath . BTW FRACT_XXX = 42 (of course it is ;).

volatile unsigned long timer0_overflow_count = 0;

SIGNAL(TIMER0_OVF_vect)
{
	timer0_overflow_count++;
}

unsigned long millis()
{
        // update to prevent reading incorrect value
      	uint8_t oldSREG = SREG;
        cli();
	unsigned long m = timer0_overflow_count * MILLIS_INC;  
        SREG = OLDREG
        //unsigned long tm = m /42;  // ticks missed... 
        unsigned long tm = (m + m>>1);
        tm = tm >> 6 + tm >> 12;
        tm += tm >> 12;
 	return m + tm;
}

The ISR is still 1.7 uSec
Millis() is now about 22 uSec (still more than 10x original) so if it is called less then every ~15 millis() we gain clock cycli.
Furthermore it reduced the time in CLI()

Disclaimer, not verified the above experimental code in every aspect, so do not use this unless you know what you are dealing with (I’m not :wink:

@PITO
Can you check the above if it makes sense for you?

updated the last code samples as they had serious bugs …

@rob: Great analysis!! My primary idea has been to have uint64_t timer0_overflow_count, and any calcs done in millis(). The cli() only for reading the timer0_overflow_count variable (as that is the capture point). And, may be a faster T0 interrupt freq (ie 256us)..

volatile unsigned long long timer0_overflow_count = 0;

SIGNAL(TIMER0_OVF_vect)
{
    timer0_overflow_count++;  
}

unsigned long millis()
{
        // update to prevent reading incorrect value
         uint8_t oldSREG = SREG;
        cli();
    unsigned long long m = timer0_overflow_count;  
        SREG = OLDREG

        Any calculus here...
}

But that might be too much processing for an 8bitter :)

thx,

pito: ... to have uint64_t timer0_overflow_count,...

64 bit ints are rather slow on arduino

small patch - remove the multiply from the CLI() mode to outside the CLI() mode in millis()

volatile unsigned long timer0_overflow_count = 0;

SIGNAL(TIMER0_OVF_vect)
{
    timer0_overflow_count++;
}

unsigned long millis()
{
         uint8_t oldSREG = SREG;
        cli();
    unsigned long m = timer0_overflow_count;  
        SREG = OLDREG

        m *= MILLIS_INC;
        unsigned long tm = (m + m>>1);
        tm = tm >> 6 + tm >> 12;
        tm += tm >> 12;
    return m + tm;
}

careful we dont dilute this thread from its title.

however the millis and ISR talk caught my eye. the slower millis call, for most programs that use the call, surely is a bad thing ?

one example, i’ve got a mega running ethernet shield listening and sending data on sensor changes ( 12 of them ) also for human interface its using keypad entry and outputs to LCD. some fairly simple calcs every loops,and some FIFO buffering, this is running at over 6000 loops per second,and I call millis() every loop 5 times ( to get 5 parallel tasks ) that’s a low loop count and call for most programs I would say.

another program not as many millis() calls ( two per loop ) runs at over 60,000 loops per second. doing some running averaging. I think slowing millis() is a bad idear.

does timer 0 allow triggering on reaching a value ? ie. 250. then both ISR and millis() could be simpler ?

does timer 0 allow triggering on reaching a value ? ie. 250. then both ISR and millis() could be simpler ?

Yes, that is possible, nilrtos does it now (with fat16lib's patch, even in parallel with millis (isr overflow) which still work, nilTick isr is TIMER0_COMPA with OCR0A += 250; ). So I have got a clean 1.000ms nilTick for rtos and old millis() for arduino compatibility. You will loose a pwm channel, however. I did a 24h long measurement of millis() vs. niltick - no drift visible. There could be a small jitter in millis when both overflow and compa isrs fire at same time, but no tick lost in 24hours (against millis).

http://forum.arduino.cc/index.php?topic=178532.msg1356437#msg1356437