getRandomBits(n), getRandomByte() and flip() - faster generation of random bits

Last week I did a simulation where I needed a lot of random true/false (1 random bit) values and it took quite some time as for every random bit the random() function is called. It suddenly became clear that I threw away 30 random bits when I asked for 1. That smelled like inefficiency so I dived into it and I implemented three functions that generate random bits more efficient by caching the unused bits. A test-sketch shows a serious performance improvement, so I like to share these results.

Disclaimer: the sketch is only tested on an UNO so I do not know if other boards (esp ARM) show similar performance gains (but I expect there are).

The functions are:
getRandomBits(n) fetches n random bits per iteration. This is a generic function.
getRandomByte() is an faster version for 8 bits/1 byte.
flip() is a faster version for 1 bit (simulates the flipping of a coin, head tail, true false, 0, 1)

Note that these functions do not replace random(n), except when n is a power of 2.
So expect random( 8 ) to be as random as getRandomBits(3);

bool flip()
{
  static uint32_t buf = 0;
  static uint8_t idx = 0;
  if (idx)
  {
    buf >>= 1;
    idx--;
  }
  else
  {
    buf = random();  // refill
    idx = 30;
  }
  return buf & 0x01;
}

// note: parameter n is not checked
uint32_t getRandomBits(uint8_t n) // n = 1..31
{
  static uint32_t buf = 0;
  static uint8_t idx = 0;
  uint32_t t = 0;
  if (n > idx)
  {
    n -= idx;
    t = buf << n;
    buf = random();  // refill 31 bits
    idx = 31;
  }
  // for performance (10-20%)
  if (t == 0) t = buf & ((1 << n) - 1);
  else t |= buf & ((1 << n) - 1);
  buf >>= n;
  idx -= n;
  return t;
}

// has room for improvement.
uint8_t getRandomByte()
{
  static uint32_t buf = 0;
  static uint8_t idx = 0;
  if (idx > 2)
  {
    buf = random();  // refill 31 bits
    idx = 0;
  }
  uint8_t t = buf & 0xFF;
  buf >>= 8;
  idx++;
  return t;
}
//
//    FILE: getRandomBits.ino
//  AUTHOR: Rob Tillaart
// VERSION: 0.1.00
// PURPOSE: faster fetching of random bits
//    DATE: 2015-08-10
//     URL:
//
// Released to the public domain
//

uint32_t start;
uint32_t d[3];

uint16_t count = 0;

bool flip()
{
  static uint32_t buf = 0;
  static uint8_t idx = 0;
  if (idx)
  {
    buf >>= 1;
    idx--;
  }
  else
  {
    buf = random();  // refill
    idx = 30;
  }
  return buf & 0x01;
}

uint32_t getRandomBits(uint8_t n) // n = 1..31
{
  static uint32_t buf = 0;
  static uint8_t idx = 0;
  uint32_t t = 0;
  if (n > idx)
  {
    n -= idx;
    t = buf << n;
    buf = random();  // refill 31 bits
    idx = 31;
  }
  // for performance (10-20%)
  if (t == 0) t = buf & ((1 << n) - 1);
  else t |= buf & ((1 << n) - 1);
  buf >>= n;
  idx -= n;
  return t;
}

// room for improvement.
uint8_t getRandomByte()
{
  static uint32_t buf = 0;
  static uint8_t idx = 0;
  if (idx > 2)
  {
    buf = random();  // refill 31 bits
    idx = 0;
  }
  uint8_t t = buf & 0xFF;
  buf >>= 8;
  idx++;
  return t;
}


void setup()
{
  Serial.begin(115200);
  Serial.println("Start getRandomBits 0.1.00");
  Serial.println("Timing in micros for 1000 calls\n");

  Serial.println("Flipping coin - random(2)");
  count = 0;
  start = micros();
  for (int i = 0; i < 1000; i++) count += random(2);
  d[0] = micros() - start;
  Serial.print("time:\t");
  Serial.println(d[0]);
  Serial.print("count:\t");
  Serial.println(count);

  Serial.println("\nFlipping coin - flip()");
  count = 0;
  start = micros();
  for (int i = 0; i < 1000; i++) count += flip();
  d[2] = micros() - start;
  Serial.print("time:\t");
  Serial.println(d[2]);
  Serial.print("count:\t");
  Serial.println(count);

  Serial.print("factor:\t");
  Serial.println(1.0 * d[0] / d[2], 2);

  Serial.println("\ngetRandomBits() - 1..8");

  start = micros();
  for (int i = 0; i < 1000; i++) count += getRandomBits(1);
  d[1] = micros() - start;
  Serial.println(d[1]);

  start = micros();
  for (int i = 0; i < 1000; i++) count += getRandomBits(2);
  d[1] = micros() - start;
  Serial.println(d[1]);

  start = micros();
  for (int i = 0; i < 1000; i++) count += getRandomBits(3);
  d[1] = micros() - start;
  Serial.println(d[1]);

  start = micros();
  for (int i = 0; i < 1000; i++) count += getRandomBits(4);
  d[1] = micros() - start;
  Serial.println(d[1]);

  start = micros();
  for (int i = 0; i < 1000; i++) count += getRandomBits(5);
  d[1] = micros() - start;
  Serial.println(d[1]);

  start = micros();
  for (int i = 0; i < 1000; i++) count += getRandomBits(6);
  d[1] = micros() - start;
  Serial.println(d[1]);

  start = micros();
  for (int i = 0; i < 1000; i++) count += getRandomBits(7);
  d[1] = micros() - start;
  Serial.println(d[1]);

  start = micros();
  for (int i = 0; i < 1000; i++) count += getRandomBits(8);
  d[1] = micros() - start;
  Serial.println(d[1]);

  Serial.println("\ngetRandomByte()");
  start = micros();
  for (int i = 0; i < 1000; i++) count += getRandomByte();
  d[1] = micros() - start;
  Serial.println(d[1]);
}

void loop()
{
}
Start getRandomBits 0.1.00
Timing in micros for 1000 calls

Flipping coin - random(2)
time:	96788
count:	522

Flipping coin - flip()
time:	5372
count:	497
factor:	18.02

getRandomBits() - 1..8
10000
12088
14600
17040
19468
21932
24348
26740

getRandomByte()
20760

prelimenary conclusions:

  • These first results show that for flipping a coin the performance improvement is a factor 18, when comparing a call to random(2) and the optimized flip(). Different test-runs gave numbers between 15 and 20, so substantially faster on average. Note that when the cache is refilled the call takes more time!
  • The flip() implementation is also almost twice as fast as the generic call to getRandomBits(1).
  • the getRandomByte() implementation is ~20% faster than getRandomBits( 8 ), enough to keep it.

TODO:

  • make a decent library for these functions.
  • squeeze some more uSecs from it
  • test ...

As always remarks are welcome

bug fix for getRandomByte(), first three values flawed

uint8_t getRandomByte()
{
  static uint32_t buf = 0;
  static uint8_t idx = 0;
  if (idx == 0)
  {
    buf = random();
    idx = 3;
  }
  uint8_t t = buf & 0xFF;
  buf >>= 8;
  idx--;
  return t;
}

Bug fix for getRandomBits(p), for values of p > 15 - explicitly use 1UL for creating the right mask

uint32_t getRandomBits(uint8_t n) // n = 1..31
{
  static uint32_t buf = 0;
  static uint8_t idx = 0;
  uint32_t t = 0;
  if (n > idx)
  {
    n -= idx;
    t = buf << n;
    buf = random();  // refill 31 bits
    idx = 31;
  }
  // for performance (10-20%)
  if (t == 0) t = buf & ((1UL << n) - 1);
  else t |= buf & ((1UL << n) - 1);
  buf >>= n;
  idx -= n;
  return t;
}

some ideas are still under scrutiny ...

final version for today 0.1.02, all previous versions above are obsolete

//
//    FILE: getRandomBits.ino
//  AUTHOR: Rob Tillaart
// VERSION: 0.1.02
// PURPOSE: faster fetching of random bits
//    DATE: 2015-08-10
//     URL:
//
// Released to the public domain
//

uint32_t start;
uint32_t d[3];

uint16_t count = 0;

bool flip()
{
  static uint32_t buf = 0;
  static uint8_t idx = 0;
  if (idx == 0)
  {
    buf = random();  // refill
    idx = 30;
  }
  else
  {
    buf >>= 1;
    idx--;
  }
  return buf & 0x01;
}

uint32_t getRandomBits(uint8_t n) // n = 1..31
{
  // for large values of n the more straightforward approach is faster.
  if (n > 24) return random() & ((1UL << n) - 1);
  static uint32_t buf = 0;
  static uint8_t idx = 0;
  uint32_t t = 0;
  if (n > idx)
  {
    n -= idx;
    t = buf << n;
    buf = random();  // refill 31 bits
    idx = 31;
  }
  if (t == 0) t = buf & ((1UL << n) - 1);
  else t |= buf & ((1UL << n) - 1);
  buf >>= n;
  idx -= n;
  return t;
}

// 20388
uint8_t getRandomByte()
{
  static uint32_t buf = 0;
  static uint8_t idx = 0;
  if (idx == 0)
  {
    buf = random();  // refill 31 bits
    idx = 2;
  }
  else
  {
    idx--;
    buf >>= 8;
  }
  return buf;  // implicit conversion to 8 bits;
}

void setup()
{
  Serial.begin(115200);
  Serial.println("Start getRandomBits 0.1.02");
  Serial.println("Timing in micros for 1000 calls\n");

  Serial.println("Flipping coin - random(2)");
  count = 0;
  start = micros();
  for (int i = 0; i < 1000; i++) count += random(2);
  d[0] = micros() - start;
  Serial.print("time:\t");
  Serial.println(d[0]);
  Serial.print("count:\t");
  Serial.println(count);

  Serial.println("\nFlipping coin - flip()");
  count = 0;
  start = micros();
  for (int i = 0; i < 1000; i++) count += flip();
  d[2] = micros() - start;
  Serial.print("time:\t");
  Serial.println(d[2]);
  Serial.print("count:\t");
  Serial.println(count);

  Serial.print("factor:\t");
  Serial.println(1.0 * d[0] / d[2], 2);

  Serial.println("\ngetRandomBits() - 1..31");
  for (int p = 1; p < 32; p++)
  {
    start = micros();
    for (int i = 0; i < 1000; i++) count += getRandomBits(p);
    d[1] = micros() - start;
    Serial.print(p);
    Serial.print("\t");
    Serial.println(d[1]);
  }

  Serial.println("\ngetRandomBits() - 8");
  start = micros();
  for (int i = 0; i < 1000; i++) count += getRandomBits(8);
  d[1] = micros() - start;
  Serial.println(d[1]);

  Serial.println("getRandomByte()");
  start = micros();
  for (int i = 0; i < 1000; i++) count += getRandomByte();
  d[1] = micros() - start;
  Serial.println(d[1]);
}

void loop()
{
}

generates on UNO

Start getRandomBits 0.1.02
Timing in micros for 1000 calls

Flipping coin - random(2)
time:	96792
count:	522

Flipping coin - flip()
time:	5316
count:	497
factor:	18.21

getRandomBits() - 1..31
1	11456
2	13652
3	16272
4	18820
5	21348
6	23916
7	26436
8	28920
9	31392
10	33904
11	36364
12	38800
13	41220
14	43676
15	46012
16	48488
17	50808
18	53212
19	55544
20	57868
21	60188
22	62504
23	64788
24	67060
25	68208
26	68624
27	69088
28	69520
29	69972
30	70396
31	70840

getRandomBits() - 8
29076
getRandomByte()
20408

Note: performance of getRandomBits() dropped for lower values of bits it got an extra test in the code to speed up the higher values.

Note: if an application needs a constant number of random bits, it is worth to write a dedicated function for it if one is in a need for speed. (see above code for 1 and 8 bits)

After a good night sleep one performance improvement popped up

uint32_t getRandomBits(uint8_t n) // n = 1..31
{
  // for large values of n the more straightforward approach is faster (UNO).
  if (n > 21) return random() >> (31 - n);
  static uint32_t buf = 0;
  static uint8_t idx = 0;
  uint32_t t = 0;
  if (n > idx)
  {
    n -= idx;
    t = buf << n;
    buf = random();  // refill 31 bits
    idx = 31;
  }
  if (t == 0) t = buf & ((1UL << n) - 1);
  else t |= buf & ((1UL << n) - 1);
  buf >>= n;
  idx -= n;
  return t;
}
getRandomBits() - 1..31
1	11456
2	13652
3	16272
4	18820
5	21348
6	23916
7	26436
8	28920
9	31392
10	33904
11	36364
12	38800
13	41220
14	43676
15	46012
16	48488
17	50808
18	53212
19	55544
20	57868
21	60188
22	60660
23	60192
24	59776
25	59328
26	58884
27	58456
28	58024
29	57572
30	57120
31	56700

Are you interested in a faster / smaller / better quality generator?

yes,
always interested to learn

replacing the call to random() with proprietary random generator e.g. JKISS or "Marsaglia" generator is already on my todo list.

While working on Tiny Core 2, I decided to identify a better random number generator. This is an overview of the testing...

http://zygomorphic.com/arduino-tiny/?p=78

Park-Miller is the Libc (Arduino) random number generator. By today's standards it is essentially junk.

Xorshift is the one included with Tiny Core 2. It is very fast, significantly smaller, and gives good results for most uses.

JKISS32 is a great choice for ATmega processors. It is also very fast, a bit smaller, and gives phenomenal results compared to the other generators. The only downside is the additional storage (20 bytes vs 4 bytes).

I considered several others but they were typically slower, needed significantly more memory, performed worse than Xorshift, and/or required extensive initialization so they were not included.

For simulations, JKISS32 is, by far, the best choice of the five. If I remember correctly, it was developed for the purpose of biological simulations.

In previous testing, Marsaglia's minimal standard generator had a bit better quality but was a bit slower than Park-Miller. It just did not stand out enough to include.

The Xorshift generator I use is "Marsaglia's favourite" of the Xorshift generators. His hunch was spot-on. It performed significantly better than the other ones in his list.

People who research random number generators simply stopped working on "small" generators. All the interesting work is on things like Mersenne Twister which will typically cripple an AVR processor. I am confident you will be hard-pressed to find anything worth adding to this list.

It's bedtime. I will post source code for the one(s) you want sometime tomorrow.

Packaged as a library. Please let me know if you have problems. Good luck.

Thanks