faster dec2bcd routine, especial for RTC libraries

Been playing with dec2bcd () code lately and came to an performance optimized version, esp. for use with a RTC lib.

The traditional code for dec2bcd()

uint8_t dec2bcd(uint8_t n)
{
  return (n / 10 * 16 +  n % 10);
}

The optimized version

uint8_t dec2bcd(uint8_t n)
{
  byte b = (n * 103) >> 10;
  return (b * 16 + n-(b*10));  
}

Drawback is it eats 18 bytes (if counted right), so it is a size versus performance discussion.

Some testcode to show it works

//
//    FILE: dec2bcd.ino
//  AUTHOR: Rob Tillaart
// VERSION: 0.1.00
// PURPOSE: test faster dec2bcd()
//     URL:
//
// Released to the public domain
//

#include <stdlib.h>

volatile int dec;
volatile int bcd = 0x72;

void setup() 
{
  Serial.begin(115200);

  Serial.print("bcd2dec:\t");
  uint32_t start = micros();
  for(int i=0; i< 1000; i++)
  {
    dec = bcd2dec(bcd);
  }
  Serial.println(micros() - start);

  Serial.print("dec2bcd:\t");
  start = micros();
  for(int i=0; i< 1000; i++)
  {
    bcd = dec2bcd(dec);
  }
  Serial.println(micros() - start);

  Serial.print("bcd2dec2:\t");
  start = micros();
  for(int i=0; i< 1000; i++)
  {
    dec = bcd2dec2(bcd);
  }
  Serial.println(micros() - start);

  Serial.print("dec2bcd2:\t");
  start = micros();
  for(int i=0; i< 1000; i++)
  {
    bcd = dec2bcd2(dec);
  }
  Serial.println(micros() - start);

  for(int i=0; i<100; i++)
  {
    bcd = dec2bcd2(i);
    Serial.print(i, DEC);
    Serial.print("\t");
    Serial.print(bcd, HEX);
    Serial.print("\t");
    Serial.println(bcd, DEC);
  }

}

void loop() 
{
}

uint8_t bcd2dec(uint8_t n)
{
  return (n / 16 * 10 + n % 16);
}

uint8_t dec2bcd(uint8_t n)
{
  return (n / 10 * 16 + n % 10);
}


uint8_t bcd2dec2(uint8_t n)
{
  return ((n >> 4) * 10 + (n & 0x0F));
}

// some test version in comments
uint8_t dec2bcd2(uint8_t n)
{
  // 13340 (reference
  // return (n / 10 * 16 + n % 10);

  // 11320 (use library function;  #include <stdlib.h>
  //  div_t t = div(n,10);
  //  return (t.quot * 16 + t.rem);

  // 6612  (remove modulo)
  //  byte b = n/10;
  //  return (b * 16 + n-(b*10));

  // 1936 (remove divide
  byte b = (n * 103) >> 10;
  return (b * 16 + n-(b*10));  
}

Output of the test sketch

bcd2dec:	1604
dec2bcd:	13340
bcd2dec2:	3904
dec2bcd2:	1964

so a factor 6.7 in speed gained.

As the dec2bcd() is used when writing to the RTC the gained time is only substantial when one writes alarm times often.
If one need to write the time to the RTC often it is time to replace the backup battery (or the RTC altogether)

Note this dec2bcd() works only on single bytes!.

As always remarks are welcome,

For those interested, the magic N*103) >> 10; came from - Optimizing integer divisions with Multiply Shift in C# - CodeProject -

First seen by me in the work of Jeffrey Sax

The JeeLabs RTClib has a fast simple BCD to binary. It seems to be slightly faster than your bcd2dec2.

uint8_t bcd2bin (uint8_t val) { return val - 6 * (val >> 4); }

The JeeLabs binary to BCD is only about twice as fast as the "traditional code". It is nicely symmetrical with bcd2bin() above.

uint8_t bin2bcd (uint8_t val) { return val + 6 * (val / 10); }

Edit: here is a version of bin2bcd that is about the same speed as dec2bcd2().

uint8_t bin2bcd(uint8_t n) {
  // divide by 10
  uint8_t q = (0XCD*n) >> 11; 
  // n + 6*(n/10)
  return n + 6 * q;
}

Here is a fast version of bcd2bin().

uint8_t bcd2bin(uint8_t n) {
  uint8_t h = n >> 4;
  // n - 6 * (n >> 4)
  return n - (((h << 1) + h) << 1);
}

looks good!
tomorrow I will test them

I have an RTC library that has

uint8_t DS3232RTC::dec2bcd(uint8_t n)
{ return (n / 10 * 16) + (n % 10); }

uint8_t DS3232RTC::bcd2dec(uint8_t n)
{ return (n / 16 * 10) + (n % 16); }

Using the JeeLabs algorithms, a test sketch increased by about 150 bytes, due to the compiler inlining bcd2dec. Adding the noinline attribute as below resulted in only a 4 byte increase.

uint8_t DS3232RTC::dec2bcd(uint8_t n)
{ return n + 6 * (n / 10); }

uint8_t __attribute__ ((noinline)) DS3232RTC::bcd2dec(uint8_t n)
{ return n - 6 * (n >> 4); }

some performance numbers

uint8_t bcd2dec(uint8_t n)
{
  return ((n / 16 * 10) + (n % 16));
}

uint8_t bcd2dec2(uint8_t n)
{
  return n - 6 * (n/16); 
}

uint8_t dec2bcd(uint8_t n)
{
  return ((n / 10 * 16) + (n % 10));
}

uint8_t dec2bcd2(uint8_t n)
{
  uint16_t a = n;
  byte b = (a*103) >> 10;
  return  n + b*6;
}

uint8_t dec2bcd3(uint8_t n)
{
  return n + 6*(n/10);
}

timing results (UNO)

bcd2dec: 1672
bcd2dec2: 1580

dec2bcd: 13576
dec2bcd2: 1896
dec2bcd3: 6612

robtillaart:
some performance numbers

Thanks, Rob!

timing results (UNO)

bcd2dec: 1672
bcd2dec2: 1580

dec2bcd: 13576
dec2bcd2: 1896 ... the weird 103 calculation still wins handily, 86% reduction.
dec2bcd3: 6612 ... but this is no slouch, still a significant improvement, 51% reduction.

... the weird 103 calculation still wins handily

(n *103) >> 10 <==> n * 103 / 1024
103 / 1024 <==> 1/10 for [0..99]

Edit: you just beat me with the explanation of the 103.

The 103 is just part of a fast divide by 10 that works for 8-bit values less than 179.

  uint8_t q;
  // divide by 10 for n < 179
  q  = (103*n)/1024;
  // Same as
  q = (103*n) >> 10;

A version of bin2bcd with a fast 8-bit divide by 10 gives about the same result as dec2bcd2.

uint8_t bin2bcd(uint8_t n) {
  // divide by 10
  uint8_t q = (103*n) >> 10;
  return n + 6 * q;
}

bin2bcd: 1948
dec2bcd2: 1964

Replacing the multiply by 6 with shifts gives the same result so the compiler must optimize.

  return n + (((q << 1) + q) << 1);

The following divide by 10 works for all 8-bit values but requires an 11 bit shift so it takes an extra cycle.

  q = (205*n) >> 11;

The following divide by 10 works for all 8-bit values but requires an 11 bit shift so it takes an extra cycle.

 q = (205*n) >> 11;

Very true, however the BCD values will never be higher than 99(HEX) representing 99(DEC).
Given the fact that for a RTC the highest number need to be converted is 59 (OK we need to reimplement after 2059 - 45 years from now)
seconds, minutes = ..59; hours = ..23; days= ..31; months ..12 ; weekdays = ..7; weeks=0..53; years=13..

So we can have an even simpler mul/shift, which has an even better time

uint8_t dec2bcd4(uint8_t n)
{
  uint16_t a = n;
  byte b = (a*26) >> 8;  // b = (a*13) >> 7  was OK but slower!
  return  n + b*6;
}

bcd2dec: 1668
bcd2dec2: 1612

dec2bcd: 13552
dec2bcd2: 1900
dec2bcd3: 6612
dec2bcd4: 1648

So the timing is now (almost) on par with the fastest bcd2dec2.

Bonus: it does convert correct from 0..68 (so we only need to adapt the code in dec 2068 :wink:

The aspect of RTC libraries I have tried to optimize is conversion of RTC chip register values to Unix epoch time. This value is often used in data logging since the Unix definition is in Coordinated Universal Time (UTC/GMT).

I often use the year 2000 instead of 1970. Epoch time is the number of seconds since midnight 1 Jan of the epoch year in UTC.

My code takes about 31.5 microseconds per conversion. Here is a test sketch. It does not read the RTC.

// Time starts 2000-1-1  For Linux 1970-1-1 is used.
//const uint16_t EPOCH_YEAR = 1970;  // Linux
const uint16_t EPOCH_YEAR = 2000;
//------------------------------------------------------------------------------
/** Count of days since Epoch.
 * 1900 < EPOCH_YEAR, MAX_YEAR < 2100, (MAX_YEAR - EPOCH_YEAR) < 178.
 * \param[in] y year (EPOCH_YEAR <= y <= MAX_YEAR)
 * \param[in] m month 1 <= m <= 12
 * \param[in] d day 1 <= d <= 31
 * \return Count of days since epoch
 *
 * Derived from Zeller's congruence
 */
uint16_t daysSinceEpoch(uint16_t y, uint8_t m, uint8_t d) {
  if (m < 3) {
    m += 12;
    y--;
  }
  return 365 * (y + 1 - EPOCH_YEAR)  + y / 4 - (EPOCH_YEAR - 1) / 4
    + (153 * m - 2) / 5 + d - 398;
}
//------------------------------------------------------------------------------
/** Seconds since 1 Jan EPOCH_YEAR */
uint32_t secondsSinceEpoch(uint16_t year, uint8_t month, uint8_t day,
                           uint8_t hour, uint8_t minute, uint8_t second) {
  uint16_t days = daysSinceEpoch(year, month, day);
  return second + 60L * (minute + 60L * (hour + 24L * days));
}
//------------------------------------------------------------------------------
void setup() {
  uint32_t t[32];
  Serial.begin(9600);
  uint32_t m = micros();
  for (uint8_t d = 1; d < 32; d++) {
    t[d] = secondsSinceEpoch(2013, 8, d, 13, 50, 15);
  }
  m = micros() - m;
  Serial.print("micros: ");
  Serial.println(m);
  for (uint8_t d  = 1; d < 32; d++) {
    Serial.println(t[d]);
  }
}
//------------------------------------------------------------------------------
void loop() {}

Here is the first bit of output.

micros: 976
428680215
428766615

FYI: The Time.h lib does it in 312usecs:

  usecs = micros();
  setTime(10, 10, 33, 30, 8, 2013);
  usecs = micros() - usecs;
  time_t t = now();
  Serial.println(usecs);
  Serial.println(t);
312
1377857433

And Time.h lib with your calculus above for seconds since epoch takes 40usecs:

40
1377857433

..

void  setTime(int hr,int min,int sec,int dy, int mnth, int yr){
 // year can be given as full four digit year or two digits (2010 or 10 for 2010);  
 // it is converted to years since 1970 
  if( yr > 99)
      yr = yr - 1970; 
  else
      yr += 30;
  tm.Year = yr;
  tm.Month = mnth;
  tm.Day = dy;
  tm.Hour = hr;
  tm.Minute = min;
  tm.Second = sec;
  //setTime(makeTime(tm));
  setTime(secondsSinceEpoch(tm.Year + 1970, tm.Month, tm.Day, tm.Hour, tm.Minute, tm.Second));
}

PS: the conversion from RTC format to unix time must not be too frequent, as normally you run an internal systime (in unix format), and you do a sync with an external RTC from time to time only (Time.h does it in that way). An another conversion would be time consuming - when timestamping data with something like YYYY/MM/DD HH:MM:SS (a coversion from ux time to the human readable format).

@fat16lib
I gave the conversion of RTC chip register values to Unix epoch time a try:

  • replaced division by 5 by a faster multiply shift
  • replaced a 32 bit multiply with a 16 bit one
uint16_t daysSinceEpoch1(uint16_t y, uint8_t m, uint8_t d) {
  if (m < 3) {
    m += 12;
    y--;
  }
  uint16_t s = 153 * m - 2;
  uint16_t t = (s * 13112UL ) >> 16;  // t = s/5;
  return 365 * (y + 1 - EPOCH_YEAR)  + y / 4 - (EPOCH_YEAR - 1) / 4
    + t + d - 398;
}

//------------------------------------------------------------------------------
/** Seconds since 1 Jan EPOCH_YEAR */
uint32_t secondsSinceEpoch1(uint16_t year, uint8_t month, uint8_t day,
uint8_t hour, uint8_t minute, uint8_t second) {
  uint16_t days = daysSinceEpoch1(year, month, day);
  uint16_t s = second + 60 * minute;
  return s + 3600UL * (hour + 24UL * days);
}

(modified code)
micros: 588
micros: 18.37

(timing original code for reference)
micros: 976
micros: 30.50

Please verify

robtillaart,

Nice improvement.

Pito,

normally you run an internal systime (in unix format), and you do a sync with an external RTC from time to time only (Time.h does it in that way).

Not in real science apps. This will cause jumps in time. Time jitter is a killer in measurement.

I just use SQW from a high quality RTC chip to generate the time interrupt. Use a DS3231SN with a 1024 Hz SQW to replace the timer 0 interrupt.

If you want to calibrate the RTC chip use the 1PPS pulse from a GPS. That's good to about a microsecond of UTC. Short term jitter is around 100 ns.

Then adjust the aging offset register in the DS3231SN to adjust the capacitance applied to the crystal. The low bit changes the frequency 0.1 ppm.

robtillaart:
@fat16lib
I gave the conversion of RTC chip register values to Unix epoch time a try:

  • replaced division by 5 by a faster multiply shift
  • replaced a 32 bit multiply with a 16 bit one
uint16_t daysSinceEpoch1(uint16_t y, uint8_t m, uint8_t d) {

if (m < 3) {
    m += 12;
    y--;
  }
  uint16_t s = 153 * m - 2;
  uint16_t t = (s * 13112UL ) >> 16;  // t = s/5;
  return 365 * (y + 1 - EPOCH_YEAR)  + y / 4 - (EPOCH_YEAR - 1) / 4
    + t + d - 398;
}

//------------------------------------------------------------------------------
/** Seconds since 1 Jan EPOCH_YEAR */
uint32_t secondsSinceEpoch1(uint16_t year, uint8_t month, uint8_t day,
uint8_t hour, uint8_t minute, uint8_t second) {
  uint16_t days = daysSinceEpoch1(year, month, day);
  uint16_t s = second + 60 * minute;
  return s + 3600UL * (hour + 24UL * days);
}




(modified code)
micros: 588
micros: 18.37

(timing original code for reference)
micros: 976
micros: 30.50

Please verify

fat16lib test of 976 is for 32 conversions if i read his sample code right....

so even including the loop overhead thats 30.5 per conversion.....

yep

robtillaart:
yep

argh, i must be too tired ! as missed these times above it

(modified code)
micros: 588
micros: 18.37

sorry.