Probable randomSeed() random() bug

So a project of mine was not exhibiting the expected random behavior upon start up. I used randomSeed() in the suggested way. I tracked this down to the following:

This code gives (at least by visual inspection) random numbers:

int nRules = 36;

void setup(){
Serial.begin(9600);
}

void loop(){
randomSeed(analogRead(0));
Serial.println( random( 0, ( nRules ) ) );
delay(400);
}

But subtracting one from nRules in the call like this:

int nRules = 36;

void setup(){
Serial.begin(9600);
}

void loop(){
randomSeed(analogRead(0));
Serial.println( random( 0, ( nRules - 1 ) ) );
delay(400);
}

yields only 0, 7, 14, 21, or 28.

I've tested this with release 0017 and 0018 and both behave the same. I'm using a stock Duemilanova.

Unfortunately, not subtracting one does not avoid the multiples of 7 behavior in my larger program. But I suspect that if the above small program can be made to work it will also cure the original problem.

thanks,

Phil

Usually, you only want to seed the random number generator once per program execution - e.g. inside setup(), certainly not in loop().

-j

I understand that.

The above code is written that way just to exercise the randomSeed() function and to demonstrate that in some cases after seeding the first returned value from random() is in fact not random.

In the original app I noticed start up behavior that was not random despite being written as you noted it usually is.

The problem lies in using analogRead to seed the random number generator. Ideally, the seed value should cover the entire range of the random number generator (0x7FFFFFFF). In the very best case, analogRead covers a tiny fraction of that range. In the worst case, analogRead will consistently return a single value.

It would help to know more about your application. With more details, someone may be able to suggest something that works better for you.

Run the two little programs above and if your machine runs like mine you will see an obvious problem. One program will print random numbers and the other only prints multiples of seven. The programs are simple and the only difference between the two is the subtraction "- 1".

This should be just the thing for confirming a bug and (partially) testing a fix.

I have at least two workarounds for my app. One is to store new random seeds across power cycles. Another is writing my own RNG.

I'm mostly trying to report this bug to the project people. Is there a better way to do this?

Phil

OK, I've also posted this at code.google.com.

I'm mostly trying to report this bug to the project people.

If there is a bug, it's in the avr-libc library. You'll have to work with them on a solution.

The problem appears to be a deficiency in the random generator. The returned value for low seeds is always evenly divisible by 7.

Is there a better way to do this?

Use rand instead of random. It seems to give better results for low seeds.

Actually when I asked "is there a better way to do this" I meant a better way to report bugs. Anyway I have also posted a report at code.google.com.

It turns out that the first (pseudo) random number after seeding may or may not appear to be random depending on the arguments given to random().

Here are 2 programs that run 1000 trials for each random(0, max) where max goes from 2 to 40. One seeds only once, the other seeds prior to every use of random(). The difference is quite clear.

(Note: to create nice big noisy seeds I called analogRead 32 times and then used the 32 low bits to construct the seed).

Phil

// program to test whether the numbers generated by random()
// are reasonably (pseudo) random. It turns out that regardless
// of the maximum value being used the results appear to be
// reasonably random.

int bins[40];

void setup(){
Serial.begin(9600);
long tempBits = 0; // create a long of random bits to use as seed
for (int i=1; i<=32 ; i++){
tempBits = ( tempBits | ( analogRead( 0 ) & 1 ) ) << 1;
}
randomSeed(tempBits); // seed
}

void loop(){
for (int max=2 ; max<=40 ; max++){ // test 2 through 40 as random maximums
for (int i=0 ; i<40 ; i++) bins *= 0; // zero out counting bins *

  • for( int trial = 1 ; trial <= 1000 ; trial++ ){ // for each maximum run 1000 trials*
  • bins[random( 0, max )]++; // and count the random number generated*
  • }*
  • Serial.print( max ); // display the results (counts for each random number)*
    _ Serial.print( " *** " );_
  • for (int i=0 ; i<max ; i++){ // display the counts*
    _ Serial.print( bins ); _
    _
    Serial.print( " " );
    _
    * }*
    * Serial.println(" ");*
    * delay(400);*
    * }*
    }
    // program to test whether the first number generated by random()
    // after being seeded is really (pseudo) random. It turns out that depending
    // on the maximum value being used the result may or may not be
    // reasonably random.
    int bins[40];
    void setup(){
    * Serial.begin(9600);*
    }
    void loop(){
    * for (int max=2 ; max<=40 ; max++){ // test 2 through 40 as random maximums*
    _ for (int i=0 ; i<40 ; i++) bins = 0; // zero out counting bins
    for( int trial = 1 ; trial <= 1000 ; trial++ ){ // for each maximum run 1000 trials

    * long tempBits = 0; // create a long of random bits to use as seed*
    * for (int i=1; i<=32 ; i++){
    tempBits = ( tempBits | ( analogRead( 0 ) & 1 ) ) << 1;
    }
    randomSeed(tempBits); // seed*
    * bins[random( 0, max )]++; // and count the random number generated*
    * }
    Serial.print( max ); // display the results (counts for each random number)
    Serial.print( " *** " );
    for (int i=0 ; i<max ; i++){ // display the counts*
    Serial.print( bins );
    Serial.print( " " );

    * }
    Serial.println(" ");
    }
    }*_

Follow-up to my previous post...

As far as I can tell, the random (and rand) implementation in avr-libc follows the description provided by Park/Miller. I coded both the "less than 32 bit processor" and "32 bit processor" versions in Pascal. I also built an Excel workbook from the equations. Values from all three match.

rand is simply the lower 15 bits of random. My suggestion for using rand to solve this problem may be a bad idea.

The Arduino code deviates from the Park/Miller algorithm. In the original algorithm, the most-significant bits remain most-significant in the final result. The opposite is true for the Arduino random function; the least-significant bits become the final result.

From my analysis, I'd have to strongly agree with Marsaglia in his reply to Park/Miller, "The congruential generator x, = 16807xn_1mod 2:31 - 1 is a good generator, but not a great generator".

(Note: to create nice big noisy seeds I called analogRead 32 times and then used the 32 low bits to construct the seed).

Good idea!

There's a bug in your implementation. The bottom bit is always zero. You need to shift first then or.

oops! nice catch...thanks, Phil

I don't understand why you think that reseeding a PRNG before each call with a poorer RNG should produce a statistically good random sequence.

The PRNG is designed to produce a statistically random set of values when called repeatedly, only calling it once doesn't allow it do so. I suspect theoretically a PRNG could be created that passes Knuth's statistical test and yet guarantee to produce a "2" as the first output after being seeded with any value.

It reminds me of an old joke, a great PRNG if you only call it once:

int random()
{
  return 4; // Random number chosen by a fair die roll
}

Your results reflect primarily the lack of randomness in your seed selection. I'd be interested to see what the distribution of the seed values are, I'm not sure how random they are.

Your results reflect primarily the lack of randomness in your seed selection

The first-value-after-seeding results are from a deficiency in the generator combined with how the generator is used in the Arduino library. Seeds 1 - 32767 always return a value evenly divisible by 7. I suspect Park/Miller didn't intend the returned value to be the result of a modulus.

I definately agree with everything you had to say. There is almost never a good reason to reseed.

The simple solution is to toss out the first N values after seeding. The best solution is to use an algorithm that has known good low-bit properties.

Is there a change to the code that would help? For example, modifying the way random() gets its return value into the requested range?

The way I see there are three problems...

  1. Seeding needs to cover the entire range of the generator ( 1 through 2^31-1 )

  2. Generating the return value should be closer to the way Park-Miller did it AND the new method should be tested with diehard and/or dieharder

  3. The Park-Miller algorithm is OK. I suspect there are better choices available.

I'm willing to help with with any or all of these problems.

I've tried building dieharder on Windows. It's clear the effort is more than I can take on right now. I've tried unsuccessfully to find a ready to run Windows binary. I do not have access to a *nix computer nor do I have the patience to set one up. So, I can provide help in the form of experience and research but I'm not able to provide testing.

Do you agree with my list?

Which problem(s) would you like to address?

#2 is the one I had in mind.

the new method should be tested with diehard and/or dieharder

Is any PRNG expected to do well with the diehard tests?

#2 is the one I had in mind.

Got it.

Is any PRNG expected to do well with the diehard tests?

No random number generator passes all statistical tests all the time. The cartoon here says it all...

The very best random generators will occasionally produce an output that appears to be very non-random.

The tests (like diehard(er)) give an idea of how good a generator is. Pass no tests? The generator is trash. Pass a few tests? The generator is probably OK for games. Pass most tests? The generator may actually be OK for gambling.

My plan is to use it (or something like it) to gauge the effect of a change. For example: is using the high-byte of the raw value instead of the low-byte a good idea?

To answer your question, yes, some do very well. They generally fall into the category of "cryptographically secure". But they typically consume too much CPU time on a modern 32 bit processor to be broadly useful.

What about a hardware solution like this one?

http://robseward.com/misc/RNG2/