The reliable but not very sexy way to seed random

#include <EEPROM.h>

void reseedRandom( void )
{
  static const uint32_t HappyPrime = 937;
  union
  {
    uint32_t i;
    uint8_t b[4];
  }
  raw;
  int8_t i;
  unsigned int seed;
  
  for ( i=0; i < sizeof(raw.b); ++i )
  {
    raw.b[i] = EEPROM.read( i );
  }

  do
  {
    raw.i += HappyPrime;
    seed = raw.i & 0x7FFFFFFF;
  }
  while ( (seed < 1) || (seed > 2147483646) );

  randomSeed( seed );  

  for ( i=0; i < sizeof(raw.b); ++i )
  {
    EEPROM.write( i, raw.b[i] );
  }
}

void setup( void )
{
  reseedRandom();
}

Comments?

Comments?

Good effort, will work in many cases.

Remarks:

  • major -
    it misses a entropy generator so the seeding is still deterministic, two arduino's (even with different sketches) will get the same random() sequence and in case of the same sketch they will behave identical I think - unless eeprom is initially different, which normally isn't ...
    Just putting 1,2,3,4,5, etc as next seed is equally deterministic; only less obfuscated and easier to program no need for happyPrime...

  • major -
    it should contain a test that the value written to EEPROM is not the same as the one read from it at the start to prevent "seed-lock "

  • minor -
    the algorithm uses EEPROM to store the seedGenerator over restarts. This is a good idea, however, it uses the first 4 bytes which are the first to use if someone uses EEPROM in their sketch, might be better to use the last 4 bytes of the EEPROM, these are less likely to be used by a sketch.

Ideas

  • readout (part of) the flashmemory (progmem ?) and do a CRC-16 or so => to be used as part of the seed. Then at least two different sketches will get a different seed.

  • Every EEPROM might have slightly different timing (I'm thinking about the delay one needs after writing has a reason). Can it be used as source of entropy?

Rob

Good effort, will work in many cases.

Thanks.

it misses a entropy generator so the seeding is still deterministic, two arduino's (even with different sketches) will get the same random() sequence and in case of the same sketch they will behave identical I think - unless eeprom is initially different, which normally isn't ...

Good point. Some solutions... Initialize the EEPROM to a value from random.org. Use a different odd prime number for different sketches.

Just putting 1,2,3,4,5, etc as next seed is equally deterministic; only less obfuscated and easier to program no need for happyPrime...

Ah, but I like using a Happy Prime. 8)

it should contain a test that the value written to EEPROM is not the same as the one read from it at the start to prevent "seed-lock "

I don't understand.

the algorithm uses EEPROM to store the seedGenerator over restarts. This is a good idea, however, it uses the first 4 bytes which are the first to use if someone uses EEPROM in their sketch, might be better to use the last 4 bytes of the EEPROM, these are less likely to be used by a sketch.

Agree.

  • readout (part of) the flashmemory (progmem ?) and do a CRC-16 or so => to be used as part of the seed. Then at least two different sketches will get a different seed.

I like the idea but it may add too much start-up delay. I'll investigate when I have more time.

  • Every EEPROM might have slightly different timing (I'm thinking about the delay one needs after writing has a reason). Can it be used as source of entropy?

While the physical EEPROM may have slightly different timings, I don't think the CPU has a way to measure it. The datasheet hints that EEPROM reads and writes are driven solely by the processor clock. I vaguely recall considering EEPROM timing instead of watchdog jitter as an entropy source and rejecting the idea because of datasheet hints.

it should contain a test that the value written to EEPROM is not the same as the one read from it at the start to prevent "seed-lock "

I don't understand.

Simplified:
Suppose the algorithm reads the value 13 from EEPROM, and after calculating the seed it should not write 13 to the EEPROM back. Ideally all numbers from 0..maxint should have an equal chance.
A better look at your code made me clear that as long as the value read from EEPROM is positive, this will be guaranteed by the primeness of Happy Prime.

However if the value from EEPROM is negative, the Happy Prime will be added repeatedly until seed becomes > 0.
That means that the seed will allways be between 1 and HappyPrime. This gives a bias, which depends on the size of HappyPrime - values in the range of Maxint/2 might score better (did not analyse this)

I propose to add the following line : raw.i = raw.i & 0x7FFFFFFF; just after reading the value from EEPROM.

This changes the sign before the while loop and that means the number will be somewhere between 1 and maxint/2 instead of between 1 and HappyPrime. Bias gone. thats the idea.

  • readout (part of) the flashmemory (progmem ?) and do a CRC-16 or so => to be used as part of the seed. Then at least two different sketches will get a different seed.
    I like the idea but it may add too much start-up delay. I'll investigate when I have more time.

instead of complete flash you could take only 256 (512?) bytes of the places where the sketch starts as that part might have the most entropy. By only using 256 bytes it is substantially faster than the complete 32K memory.

While the physical EEPROM may have slightly different timings, I don't think the CPU has a way to measure it. The datasheet hints that EEPROM reads and writes are driven solely by the processor clock. I vaguely recall considering EEPROM timing instead of watchdog jitter as an entropy source and rejecting the idea because of datasheet hints.

OK

Ideally all numbers from 0..maxint should have an equal chance ... this will be guaranteed by the primeness of Happy Prime

Exactly. One and odd primes less than 2^32 - 1 are all candidates for incrementing the raw value.

However if the value from EEPROM is negative, the Happy Prime will be added repeatedly until seed becomes > 0.

Ah, but it doesn't. I'll use 0x8000019F (-2147483233) as an example...

  do
  {
    // raw.i = 0x8000019F (-2147483233)
    raw.i += HappyPrime;
    // raw.i = 0x8000019F + 0x000003A9 = 0x80000548 (-2147482296)
    seed = raw.i & 0x7FFFFFFF;
    // seed = 0x80000548 & 0x7FFFFFFF = 0x00000548 (1352)
  }
  // seed (1352) is greater than or equal to 1 and less than or equal to 2147483646 (0x7FFFFFFE) so the loop breaks
  while ( (seed < 1) || (seed > 2147483646) );

The four values that are rejected are...

raw.i = 0x00000000 (seed = 0)
raw.i = 0x7FFFFFFF (seed = 2147483647)
raw.i = 0x80000000 (seed = 0)
raw.i = 0xFFFFFFFF (seed = 2147483647)

...which is nicely symmetric (two "positive" raw values and two "negative" raw values). So, the raw sequence is full cycle minus the four rejected values with the 2,147,483,646 seeds each occurring twice before the whole thing repeats.

instead of complete flash you could take only 256 (512?) bytes of the places where the sketch starts as that part might have the most entropy

I'm thinking loop and setup are good candidates.

Got it , the loop will only be executed once for all but 4 scenario's these latter run twice.

The latest version...

#include <avr/eeprom.h>
#include <stdlib.h>


void setup( void );

void loop( void );


void reseedRandom( uint32_t* address )
{
  static const uint32_t HappyPrime = 937;
  uint32_t raw;
  unsigned long seed;
  uint32_t* code;
  uint8_t i;
  uint32_t offset;

  offset = 0;

  /* The following adds 1050 - 922 = 128 bytes of code.  Is it worth the 128 bytes? */
/* */
  code = (uint32_t*)(setup);
  i = 0;
  do
  {
    offset ^= pgm_read_dword( code );
    ++code;
    ++i;
  }
  while ( i != 0 );
/* */

  raw = eeprom_read_dword( address );

  do
  {
    raw += HappyPrime;
    seed = (raw + offset) & 0x7FFFFFFF;
  }
  while ( (seed < 1) || (seed > 2147483646) );

  srandom( seed );  

  eeprom_write_dword( address, raw );
}

inline void reseedRandom( unsigned short address )
{
  reseedRandom( (uint32_t*)(address) );
}


uint32_t reseedRandomSeed EEMEM = 0xFFFFFFFF;


void setup( void )
{
/* */
  reseedRandom( &reseedRandomSeed );
/* */
/*
  reseedRandom( (unsigned short) 0 );
*/
}


void loop( void )
{
}

Eliminating EEPROM.h made it considerably smaller.

The randomSeed declaration is a problem. The parameter is unsigned int (16 bits) instead of a 32 bit data-type so srandom is called directly.

Including the first 256 bytes of setup in the seed adds 128 bytes of code. The cost seems too high for the benefit.

Thoughts?

Thoughts?

I may have if I knew WTF it was doing :slight_smile:

Am I correct in saying that it

XORs 256 bytes of program flash starting at setup()
reads a value from EEPROM
uses that value and the result of the XORing plus HappyPrime to arrive at "seed"
stores the seed value back into EEPROM


Rob

Looks good, I like the improvement but there are still some points

The code fetching the code of the sketch to generate a part of the seed value will produce the same offset every time for the same sketch. But it will probably be very different for every different sketch. (or compiler option/version etc) so it is definitely a big step in the right direction.

code = (uint32_t*)(setup);
  i = 0;
  do
  {
    offset ^= pgm_read_dword( code );
    ++code;
    ++i;
  }
  while ( i != 0 );

I think it is better to use an additional offset to start at a different place...

code = (uint32_t*)(setup) + some_8bit_number;

This changes the value for offset if some_8bit_number is different.
The natural option is a byte from the last seed from EEPROM.


just a last thought..
Have you tried to use pgm_read_byte() 4 times iso pgm_read_dword() ? As it is more elementary it might result in a smaller footprint... (remove long/ dword math)

Graynomad:

Thoughts?

I may have if I knew WTF it was doing :slight_smile:

I'll include more comments (than one) next time. :blush:

Am I correct in saying that it ... XORs 256 bytes of program flash starting at setup()

Yes. In theory, this gives a different set of seeds for different sketches.

reads a value from EEPROM

Yes. The previous "raw" value is read. The "raw" value is almost the seed.

uses that value and the result of the XORing plus HappyPrime to arrive at "seed"

Yes.

stores the seed value back into EEPROM

Almost. The "raw" value is stored back into EEPROM for next time.

For not knowing what it was doing you did a good job describing what it does. :wink:

You could always stuff an EEPROM with values from hotbits (HotBits: Genuine Random Numbers) as an entropy source; at least the initial seed would be non-deterministic. Store an offset to the next unused byte in EEPROM and update it each time a byte is fetched. Having net access (via whatever mechanism) also opens up some interesting RNG possibilities with hotbits, and memory devices are cheap; a flash card could be filled via hotbits and one could eschew the use of an RNG altogether.

For those new to the topic, hotbits is a service ran out of Switzerland which uses a cesium-137 gamma source coupled with a commercial radiation detector to generate random bits. More detail here if interested: How HotBits Works

No affiliation with the site here, just a math/cryptography geek.

1 Like

buzzdavidson:
You could always stuff an EEPROM with values from hotbits (HotBits: Genuine Random Numbers) as an entropy source; at least the initial seed would be non-deterministic.

An excellent idea.

For those with an Ethernet Shield, a combination of this from robtillaart and a bit of EEPROM code can be used to automate the initialization.

Store an offset to the next unused byte in EEPROM and update it each time a byte is fetched.

You lost me.

Having net access (via whatever mechanism) also opens up some interesting RNG possibilities with hotbits, and memory devices are cheap; a flash card could be filled via hotbits and one could eschew the use of an RNG altogether.

Also both good ideas.

For those new to the topic, hotbits

http://www.random.org/ is also a good source of entropy and information.

Sorry, posting while distracted :stuck_out_tongue: It's a technique I've used in a couple of projects - to ensure that a given byte from the EEPROM is only used once, the first byte(s) of the EEPROM store an offset to the next unused byte of random data. When byte(s) are read from EEPROM, the offset is updated accordingly. On a slow microcontroller like the AVR you don't really have contention issues, so a simple approach like this works fine.

EEPROM is not particularly fast; if speed is a concern then FRAM or flash may be a more appropriate storage medium. For occasional access like refreshing a random seed, EEPROM is just fine (not to mention much cheaper)

robtillaart:
Looks good

Thanks.

The code fetching the code of the sketch to generate a part of the seed value will produce the same offset every time for the same sketch. But it will probably be very different for every different sketch. (or compiler option/version etc) so it is definitely a big step in the right direction.

I'm not convinced it is worth the cost. I'm also not convinced it is necessary. But I can't yet see far enough down each road to know for certain.

Someone who wants a unique random set for each board or sketch can simply initialize the "raw" value (the value stored in EEPROM) with a truly random value (or a constant). I'm leaning towards providing that as an example and removing the code / offset part.

I think it is better to use an additional offset to start at a different place...

Why?

Have you tried to use pgm_read_byte() 4 times iso pgm_read_dword() ? As it is more elementary it might result in a smaller footprint... (remove long/ dword math)

That does help (20 bytes trimmed). Thanks for the suggestion.

The majority of the code added is from random / srandom then the EEPROM stuff. The reseedRandom does not pull in any 32 bit library code. The 32 bit operations all get turned into four machine instructions (add, exclusive-or, load, store). As far as I can tell, the only thing of significant size that can optionally be removed from reseedRandom is the code / offset part.

..some atmels(e.g 328) have a temperature sensor on the chip - what about to use the temp value or the noise it produces? P/

My personal interest is to produce a unique "serial number" for a board (well chip really). As this is a one-off, first time powered up scenario where all chips will have exactly the same code and no connection to ethernet or anything else this is different to the above code and some other methods that have been discussed. Also execution time doesn't matter, code size does.

I've been playing with accumulating the LSB from an unconnected ADC pin into an 8-byte array then running an 8-bit CRC on the array. I do that 4 times to get a 32-bit number.

This seems to work OK but still not good enough I think and the CRC code blows the footprint out too much. (OK if CRC is being used elsewhere in the application though)

Another option I'm thinking about is to use a semi-random byte generated as I just mentioned to load the RC oscillator adjust register, then count the time it takes to do an EEPROM write and use one byte from the result. Once again do this four times to get 32 bits.

Maybe the temp sensor can be thrown into the mix here as well, say as the original entropy source to get the RC oscillator adjust value.

I guess this is similar to CBs watchdog arrangement, but with a semi-randomly adjustable timeout.


Rob

Graynomad:
My personal interest is to produce a unique "serial number" for a board (well chip really).

Dallas DS28CM00 is cheap (< 1 USD in single quantity), interfaces with i2c, and provides a globally-unique 64-bit id. Ditto for DS2401 if 1-wire is an option.

http://datasheets.maxim-ic.com/en/ds/DS28CM00.pdf

Just a thought...

My personal interest is to produce a unique "serial number" for a board (well chip really)

How will they be programmed? By you? By a fabrication shop?

pito:
..some atmels(e.g 328) have a temperature sensor on the chip - what about to use the temp value or the noise it produces? P/

Tried it. It is not a good entropy source. An unconnected analog input is a better choice. But that too has some serious drawbacks not the least of which is that, under certain conditions, the value produced is a constant.

The bottom line is that AVR processors are extremely deterministic. Which is wonderful when you're trying to control something but a pain in the butt when you're trying to generate a random number.

Graynomad:
I've been playing with accumulating the LSB from an unconnected ADC pin ... This seems to work OK but still not good enough

I had fun running the experiments but the results were not good. A four inch insulated wire gave the best results. Longer or shorter wires and the values turned to garbage (randomly speaking, of course). And anything near the wire had an effect. The method is not very reliable.

Another option I'm thinking about is to use a semi-random byte generated as I just mentioned to load the RC oscillator adjust register, then count the time it takes to do an EEPROM write and use one byte from the result. Once again do this four times to get 32 bits.

I had looked at that as well. It appears that EEPROM write times are driven entirely by the processor clock. In other words, an EEPROM write appears to always take X processor clock ticks. But, I didn't pursue it long enough to determine that for certain.

Maybe the temp sensor can be thrown into the mix here as well

In general, the internal temperature sensors seem to be too stable. Especially if the processor is flowing any current (processor is above ambient temperature).

For the ATtiny85s I have, the internal temperature sensor only produces even values. The granularity is too course.

I guess this is similar to CBs watchdog arrangement

That didn't go too well either. For some processors, the results were good. For SMD processors, the results were very very bad.

I even tried this technique...
http://home.comcast.net/~orb/details.html
...without much success. I believe the sample-and-hold capacitor in the AVR processor has too little capacitance for the technique to work. I can't remember the details but I just was not getting good results.