how random is Arduino-random

There are a lot of articles about this matter, but since we have the Serial plotter I would like to add this one.
Programming a digital dice you would code something like this: byte r = random(1,7);
And on the long run you would expect the numbers 1 to 6 distributed faily equally.
Use this little sketch:

void setup() {
  Serial.begin(115200);
  Serial.println(__FILE__);
}

float est = 100.0 / 6;
long count = 0;
long freq[7];

void loop() {
  byte r = random(1,7);
  count++;
  freq[r]++;
  for (byte i = 1; i < 7; i++) {
    Serial.print(100.0 * freq[i] / count - est, 4);
    Serial.print("\t");
  }
  Serial.println();
}

and activate the Serial Plotter (IDE version 1.6.7).
If you expect the values to approach the zero line after a while you will be disappointed.
See attached screenshots.
Even after thousands of random calls the lines run very differently from each other.
(Donald Knuth taught us how to do it properly.)
For most purposes the random function will do but it surely could be improved.

Pix added inline for ease of reading.

acace078cfaa0f1853308f07070ff8f1540fc34d.png

ab8f3dfa7025700c0a7b14464e0007acbd2b59a1.png

cff1986ca8499951336dabf767d06bc92989b9c6.png

It's pseudo random and you never change the seed....

Klausj: er thousands of random calls the lines run very differently from each other. (Donald Knuth taught us how to do it properly.) For most purposes the random function will do but it surely could be improved.

So what are your findings?

After throwing the dice 6 million times you'd expect that each side showed up one million times exactly? But actually the random function generates a discrepancy of let's say 0.01% or something like that?

How much is that?

1% of 1 million is 10000. 0.01% of 1^million is 100.

So throwing the dice 6 million times shows each number in the range of 999900 to 1000100 instead of exactly 1000000 times.

Or what do you want to say?

As septillion stated, it is a pseudo random number generator and, if you don't seed it before using it, it will produce a repeatable sequence of numbers. Actually, not seeding during code development can aid in debugging since the first step in debugging is making the error repeatable. Having a repeatable sequence without the seed can help. Anyway, that's the great thing about Open Source software: If you're unhappy with the random number generator, you can write a replacement for it.

econjack:
Actually, not seeding during code development can aid in debugging since the first step in debugging is making the error repeatable. Having a repeatable sequence without the seed can help.

I tried to explain that to someone about 15 years ago, fell on deaf ears. He was adamant that random needs to be random…

@JimboZa: been there, done that! Try teaching Intro C to 150 Freshmen, half of whom think they already know everything because they taught themselves Basic while they were in high school. In many cases, I think teachers are underpaid!

econjack: @JimboZa: been there, done that! Try teaching Intro C to 150 Freshmen, half of whom think they already know everything because they taught themselves Basic while they were in high school. In many cases, I think teachers are underpaid!

Ain't that the truth!! We should pay teachers what our "representatives" get paid and pay the "representatives" what teachers get paid. Be far more appropriate all around!!

Actually, the problem might not be in the PRNG but in the bracketing to 1-6.

In this case changing the seed does not matter, that will just start you in a different place in the sequence.

Thank you for all your postings. Of course: software random generators are all pseudo random generators. And changing the seed does not matter, even if you take it from A0. In theory, according to the Law of large numbers, the observed relative frequencies should converge.

Well, gpsmikey's comment was somewhat off-topic, but I agree completely. And KeithRB eventually got the point: yes, I could track it down. The gcc uses the linear congruential generator (LCG) published by Park and Miller, Communications of the ACM, vol. 31 in 1988 and since then proven correct may times. (https://github.com/vancegroup-mirrors/avr-libc/blob/master/avr-libc/libc/stdlib/rand.c) The random function returns a 31-bit value, the period is the maximum possible period (sign-bit always cleared). So, when you call random(a,b) some multiply and divide operations have to be performed. And that is where the loss of precision happens. I managed to find a work-around by successively comparing the 31-bit random values to RAND_MAX/2, RAND_MAX/3, RAND_MAX/4, RAND_MAX/5, RAND_MAX/6 getting much better results. But the code looks so ugly that I do not dare to publish it.

Not me, Steve Summit: http://c-faq.com/lib/randrange.html

jurs: So what are your findings? After throwing the dice 6 million times you'd expect that each side showed up one million times exactly? But actually the random function generates a discrepancy of let's say 0.01% or something like that?

+1 What you have there is known as a "Gee Whiz" graph (the axis is magnified to suggest a discrepancy much larger than is really significant).

I use (include) the Arduino “Entropy” library in wiring.cpp:

…and then in wiring.cpp I call it automatically to set the random seed each time:

    // initialize C++ random with a truly random seed
    if (!Entropy.available()) {
        Entropy.initialize();
        while (!Entropy.available());
        randomSeed (Entropy.random());
    }

Now random provides TRULY random numbers.

Beta test of my noise engine:

// random seed 1.02
//
// produces a generally random number based on processing
// error voltages from the readings taken from the analog to digital
// conversion port
// on the Arduino

// 2016-02-17 test/demo version

const byte analogPort = A1;

const int BUCKET_SIZE = 64;
int buckets[BUCKET_SIZE];

void setup() {
  Serial.begin(9600);
}

void loop() {
  int temp = getRandomSeed();
  buckets[temp]++;

  for (int i = 0; i < BUCKET_SIZE; i++)
  {
    if (buckets[i] == 0)
    {
      Serial.print(' ');
    }
    else
    {
      Serial.print(buckets[i]);
    }
    Serial.print(' ');
  }
  Serial.println('*');
}

int getRandomSeed()
{
  // magic numbers tested 2016-02-17
  const int baseIntervalMs = 250UL;
  const byte sampleSignificant = 13;
  const byte sampleMultiplier = 50;

  const byte hashIterations = 6;
  int intervalMs = 0;
  
  unsigned long reading;
  int result = 0;

  Serial.print("randomizing...");

  for (int i = 0; i < hashIterations; i++)
  {
    Serial.print(' ');
    Serial.print( hashIterations - i );

    // put a "kick pulse" on the pin
    //
    pinMode(analogPort, INPUT_PULLUP);
    pinMode(analogPort, INPUT);

    // Now there will be a slow decay of the voltage,
    // about 8 seconds
    // so pick a point on the curve
    // offset by the processed previous sample:
    //
    delay(baseIntervalMs + intervalMs);

    // take a sample
    reading = analogRead(analogPort);
    result |= (reading & 1) << i;

    // take the low "digits" of the reading
    // and multiply it to scale it to
    // map a new point on the decay curve:
    intervalMs = (reading % sampleSignificant) * sampleMultiplier;
  }
  Serial.println();
  return result;
}

aarg: Beta test of my noise engine:

// random seed 1.02
//
// produces a generally random number based on processing
// error voltages from the readings taken from the analog to digital
// conversion port
// on the Arduino

If I understand correctly, you are generating a random number by pulling an analog pin high, then reading it as it discharges to whatever equilibrium point it wants, correct?

Krupski: If I understand correctly, you are generating a random number by pulling an analog pin high, then reading it as it discharges to whatever equilibrium point it wants, correct?

That is one part of it, yes. There is already some randomness in it, doing that is just a way to increase it. But an important element of it, is to repeat the process so that different parts of the discharge are sampled. "Shaking the box" if you like. Of course, the discharge is nearly identical each time. I attempt to amplify the difference.

I ran it for about an hour, using the supplied magic numbers. The distribution looked really excellent. The next job is to tweak the constants to try to speed it up without losing randomness.

It's been tested only on a 2560 and a 328p.

A generator better than Park Miller in every way…

(Read through to the end of the thread.)