The reliable but not very sexy way to seed random

How will they be programmed? By you? By a fabrication shop?

That's the rub.

If it's just me programming I could presumably have a running counter on my PC and burn the next number into EEPROM with avrdude.

But (in theory at least) other people will be programming because the code will be open source, but (once again in theory and assuming anyone else will be interested :)) my chip and yours should have different serial numbers.


Rob

Graynomad:
If it’s just me programming I could presumably have a running counter on my PC and burn the next number into EEPROM with avrdude.

That was were I was headed.

I suggest treating the problem as a documentation issue. Provide a written description of the serial numbers and a recommended way to generate them.

But that too has some serious drawbacks not the least of which is that, under certain conditions, the value produced is a constant.

My original idea was to make an array of say 10 bytes built from the LSB of a lot of ( 8 x 10 ) analogReads from a pin that is not connected to anything. Then I dedupe the bits to produce less bit repetition and finally use the first 4 bytes of the array as a 32-bit number.

This starts off OK with analogRead values in the order of 300, what seems to happen though is that the after a few reads the values start going down until you are getting very low numbers. As I'm only interested in the lowest bit I'm not sure that matters, but here is a graph of the result.

As you can see the distribution is not that great, only 600 samples but it's obvious where it's going.

So then I thought I'd build a single "random" byte and use that as an offset to index into the program space in a similar fashion to CB's code above. I haven't done the indexing into the code yet but I thought I'd graph the offset values.

With just over 14000 samples distribution is pretty good but check out the weird banding. I suspect this has something to do with the way I dedupe the bits because I don't treat the array as a bit stream, I just dedupe the bits in each byte without taking the neighboring byte into account.

Now there will of course be duplication here and indexing into the code space won't help that because all processors will have the same code. So now I'm thinking that if I do this 4 times and use the result as a long that should be unique enough for my purposes.

EDIT: That ORB looks interesting, I'll have a closer look.

ANOTHER EDIT: The EEPROM write time is supposed to be controlled by the internal RC clock, as this is tunable maybe that would work.


Rob

analogReads from a pin that is not connected to anything

The problem is that anything near the unconnected pin can influence it. Neighbor pin set HIGH or LOW. Human hand. OttLight coming on. Near-by ground or power line (even low voltage). Even my air-conditioner blowing cold air across the board seems to have an effect. All the unimaginable influences make the process unreliable. Sometimes the results are great. Sometimes the results are bad.

Then I dedupe the bits

Use von Neumann whitening.

Use von Neumann whitening.

That was my plan but I was too lazy to write a proper algorithm so I just did the bytes separately.

Do you have some code?


Rob

Graynomad: Do you have some code?

I do. Give me a few minutes ... Um, how do you feel about ObjectPascal?

..some atmels(e.g 328) have a temperature sensor on the chip - what about to use the temp value or the noise it produces? P/

An earlier discussion see - http://www.arduino.cc/cgi-bin/yabb2/YaBB.pl?num=1294239019 -

how do you feel about ObjectPascal?

It's been a few years, but if it will run on an AVR that's OK by me :)


Rob

This was discussed recently here (can’t find the link) - I summarized the best (to me) option on my blog:

http://www.utopiamechanicus.com/77/better-arduino-random-numbers/

Basically, the last bit of an unused ADC pin has a lot of fluctuation - using Neuman debiasing we avoid the skew of too many 1s or 0s.

unsigned long seedOut(unsigned int noOfBits)
{
  // return value with 'noOfBits' random bits set
  unsigned long seed=0, limit=99;
  int bit0=0, bit1=0;
  while (noOfBits--)
  {
    for (int i=0;i<limit;++i)
    {
      bit0=analogRead(0)&1;
      bit1=analogRead(0)&1;
      if (bit1!=bit0)
        break;
    }
    seed = (seed<<1) | bit1;
  }
  return seed;  
}

(note the limit=99 - prevents problem values from creating an endless loop)

Or a quick and dirty option that is still random(ish) yet non-deterministic:

unsigned long seed=0, count=32;
while (--count)
  seed = (seed<<1) | (analogRead(0)&1);
randomSeed(seed);

I believe that any ADC pin, as long as not at full (1023) or zero, fluctuates constantly in the last bit - debiasing the bits should be enough to create quite random numbers, but I’d be very interested in seeing your graphs on these outputs…

Here’s a graph of the QnD version (after a small mod, it only returned values < 128)

serious banding every 64 bits.

and the improved version with whitening

some banding but also an interesting concentration below 128.

I ran both to produce an 8-bit number to match the last graph.


Rob

@David

unsigned long seedOut(unsigned int noOfBits) will only work if you have a free ADC. But for many purposes it will work quite well. See my posting above.

@Graynomad
Looking at the upper graph

serious banding every 64 bits.

If I see correctly also “less serious banding” at multiples of 32, 96, 160, 224 - can you make a frequency plot?

I was wondering, if the frequency diagram is known and stable, one could “straighten/flatten” the distribution by using an array with “anti-frequency -weights”.
If 64 happens 5 times as often, only one in 5 may pass …

A simple way to use the banding is could be

if (val == 64 ) seed = seed << 1;
if (val == 128 ) seed = seed << 1 | 1;
// other band can be used too
// else ignore

OK it would take a zillion iterations and a big footprint to come to 32 random bits, but just share the idea.

Question: common sense say the LSB is the most random bit of the floating ADC, but maybe other bits are more random, ever investigated?

Looking at the lower graph

  • how about masking the lower 7 bits as the lower half seems quite random (to be checked)

Fetching 7 random bits 5 times and you have 32 bits (+3 spare that can be kept in a static byte for next time?)

32, 96, 160, 224

Yes there's certainly something going on there.

can you make a frequency plot?

No idea, I'll have a play. I'm just sucking the values into Excel. If you know how to I can send you the text file.

I'll have a play with your other ideas, meanwhile here's the same algorithm as the last graph, but with delay(100) between each analogRead(). It's got a better distribution, as I noted before the timing of reads has an effect.


Rob

Rob - I suspect the second graph’s banding is because of the limit=99 I had to put in there - it was possible for them to go on quite a while for some values. It might disappear as you go higher (like limit=1000 or so). Likewise, your delay() must have introduced enough distance between that the values changed a lot. Since a digital read is about 100msec, a delay might be just as good, or even better if lower msec values work well.

As for unused ADC/bits, I was playing around with a pot on an ADC over the weekend - the lower bits still jump around, so I’m thinking just about any ADC not at zero or full will exhibit some fluctuation at the last bit - gather them, debias, and you might have a more general solution.

so I'm thinking just about any ADC not at zero or full will exhibit some fluctuation at the last bit

In my testing, a stable reference eliminated even the least-significant bit fluctuations. Try putting a 0.1 uF capacitor on AREF or using the internal band-gap.

Rob,

If you're willing to add a capacitor and a resistor this may work... http://home.comcast.net/~orb/details.html

I'm a bit suspicious but the developer is convinced the technique is sound. I can say that it absolutely does not work with an ATmega328 processor unless an external capacitor is added. I think the internal sample-and-hold capacitor is too weak.

I've played a little bit more with this.

I noticed that (as Coding Badly said) the floating input is very flaky, turning on/off just about anything changes the nature of the bit stream and it's very easy to get looooong strings of 0s and 1s, however the transitions are still pretty random and David's whitening algorithm (with "limit" upped to 1000) seems to sort it out (largely, see below), it can however get very slow if the transitions are few and far apart.

For my purposes this doesn't matter as I only want a single number, but when you are trying to collect 10,000 samples it gets a bit tedious.

I mostly worked with 8-bit results because it's easier to get a feel for what's going on, although I did a few 32-bit runs. The results however are much the same.

Still got some banding, while watching the numbers roll by I noticed a vert high proportion of values in the 170 and 85 area.

170, 85? Quick what are these magic numbers? 0xAA and 0x55, or put another way, values with alternate bits set. Other, less prominent bands are in areas like 0x42 etc, once again with many alternate bits set.

Conversely the sparse areas have values like 128 with large runs of 0s or 1s.

Conclusion, the whitening favors alternate bit levels.

@robtilaart I did look at other bits (well bit 3) but that was quite stable by comparison with bit 0.

how about masking the lower 7 bits as the lower half seems quite random (to be checked)

Unfortunately I haven't been able to reproduce that skewed pattern.

If you're willing to add a capacitor and a resistor this may work...

Not worth adding components for me I don't think.

Anyway I'm a bit sick of looking at graphs for the time being. It's probably worth looking into the whitening though.

(Still can't figure out how to display the frequency of each value in Excel)


Rob

The bands are the result of harmonic patterns in the [u]original[/u] data. In my testing, they were often produced by the processor's clock (external crystal; internal switching) or by the CPU (e.g. millis ISR). Remember, the pin is floating both externally and internally. The pin collects "noise" from inside the processor as well as outside.

Applying the whitening algorithm more than once is apparently controversial (not statistically sound) but it does improve the results considerably. The problem is that generation can be several times slower.

(Still can't figure out how to display the frequency of each value in Excel)

Data / Data Analysis / Histogram

I dived into void randomSeed(unsigned int); this morning because of a remark @ the cheat sheet thread ..

void randomSeed(unsigned [b]int [/b]seed)
{
  if (seed != 0) {
    srandom(seed);
  }
}

looking at the definition of srandom - http://www.nongnu.org/avr-libc/user-manual/group__avr__stdlib.html- I see

void srandom (unsigned long __seed);

imho: the Arduino randomSeed() misses 2^32 -2^16 = 4294901760 (still approx 2^32) opportunities to seed the random function. Therefor I propose to change the implementation in Wmath.cpp / Wprogram.h to :

void randomSeed(unsigned long seed)
{
  if (seed != 0) {
    srandom(seed);
  }
}

This is backwards compatible so we should post this as an issue...

Think it sheds a new light to the seeding problem :)

http://arduino.cc/forum/index.php/topic,66206.msg499437.html#msg499437 :stuck_out_tongue:

It looks like the zero check is unnecessary. So, randomSeed can be recoded like this...
#define randomSeed(s) srandom(s)

Missed that line in your post, alas the conclusion is the same

I have posted it as an issue - http://code.google.com/p/arduino/issues/detail?id=575 -