Go Down

Topic: Problem reading random noise generator (Read 11384 times) previous topic - next topic

acegard

#30
Jun 02, 2014, 08:21 am Last Edit: Jun 02, 2014, 08:44 am by acegard Reason: 1
Okay, after a long day of testing and tweaking and incorporating the suggestions given, I have some results of my own working mainly from this design as a starting point, but changing the transistors to 2N2222s and the voltage divider resistors to 10K and 5.6K, to keep it at the same relative level from 9V as opposed to 12V. Using this circuit, I compared a sketch that I wrote, emulating the code on Rob Seward's page but mainly refactoring, to Rob Seward's code. I noticed that as the sample size increased, it became apparent that the results tended to bunch toward the low end of the spectrum. I also have some specific questions, and some answers to yours:

I am limited to a voltage source of 9V for the final project...

9V battery?

Yes.

I have also made some suggested changes to your circuit.  First, since you are using a capacitor to isolate the noise voltage/current from the op-amp (ac-coupling) you can use a single sided rail to rail op amp supplied by 5V to perform your later amplification and buffering.  I use the MCP629x family but any similar opamp should work.  BTW, you may want to try a larger capacitor, say 0.1 - 0.22 uF

I have abandoned the amplification stage for now, as this appears to be working. Tomorrow morning, the lab opens again and I'm going to go in and put everything on the scope to see what it looks like and see if I can get better performance when adding an amplification stage. What characteristics should I look for in an op-amp that would be "good" for this application?


The 'best' current for maximum noise varies with the device; however, the ones I have tested have generally been 5-50 uA.

Further, there have been several people who have tested a variety of such transistors, and the 2N3904 tend to produce the most noise.

Is this current you speak of the base-emitter current of the "noisy" transistor?

Don't forget that the noise will have a higher frequency in the noise that you can sample at. Therefor there will be a degree of avraging going on which will limit the readings at the top and bottom end. You need to add a low pass filter to your noise source to take it below the niquis rate.

I tried this. I measured the frequency at which the Arduino was sampling the input (roughly 7.3 kHz), then designed a first-order lowpass filter (diagram attached) with unity gain and a cutoff frequency of 3.65 kHz which, unless I am wrong, is the Nyquist sampling frequency. When I connected it, however, it either drove all my samples to 0 or 1023 (reading from the analog input). Tomorrow, I am going to put it on the scope and see what is happening, but until then, are there any ideas? I am aware that the tendency of the distribution to accumulate at the lower ranges is likely due to this, but I'm not sure how to remedy it.

Attached is my Excel spreadsheet with my results, as well as diagrams of my updated circuits. The op-amp is a TL082, powered by +9V and gnd.

Edit: forgot to post my code. Here it is! I'm having trouble getting the Von Neumann algorithm to work, so all it does right now is XOR whitening.
Code: [Select]
///A sketch to sample and display the reading at analog 0 to test the generation of random noise

#define inPin A0
#define led 13
#define CALIBRATE_SIZE 30000

unsigned long time = 0;
long times = 0;
int mini = 1023;
int maxi = 0;
int threshold = 0;
unsigned int Bins[256];
unsigned int mapped[20];
unsigned int mapped2[20];
unsigned long samples = 0;
unsigned long maxSamples = 500000;

void setup()
{
 pinMode(inPin, INPUT);
 pinMode(led,OUTPUT);
 Serial.begin(19200);
 for(int i = 0; i < 256; i++)
 {
   Bins[i] = 0;
 }
 
 for(int i = 0; i < 20; i++)
 {
   mapped[i] = 0;
   mapped2[i] = 0;
 }
   

}

void loop()
{
 Serial.println("Calibrating...");
 threshold = calibrate();
 Serial.print("Finished calibration. Threshold = ");Serial.println(threshold);
 delay(1000);
 //reset the bins
 for(int i = 0; i < 256; i++)
 {
   Bins[i] = 0;
 }
 while(!Serial.available() && samples < maxSamples)
 {
   byte result = collectData();
   Bins[result]++;
   int m = result%21;;
   mapped[m]++;
   int m2 = map(result, 0, 255, 0,19);
   mapped2[m2]++;
   //Serial.println(result);
   samples++;
   Serial.println(samples);
 }

 Serial.println("Full distribution:");
 for(int i = 0; i < 256; i++)
 {
   Serial.print(i+1);Serial.print(": ");Serial.println(Bins[i]);
 }
 Serial.println("Divided:");
 for(int i = 0; i < 20; i++)
 {
   Serial.print(i+1);Serial.print(": ");Serial.println(mapped[i]);
 }
 
 Serial.println("Mapped:");
 for(int i = 0; i < 20; i++)
 {
   Serial.print(i+1);Serial.print(": ");Serial.println(mapped2[i]);
 }
 Serial.print("Samples: ");
 Serial.println(samples);
 while(true)
 {}

}

byte collectData()//returns an 8-bit number from the random bitstream
{
 byte out = 0;
 
 for(int i = 0; i < 8; i++)
 {
   static boolean flipFlop = 0;
   //unbias with XOR whitening
   int input = analogRead(0);
   //int first = analogRead(inPin);
   //boolean one = 0;
   //boolean two = 0;
   //if(first >= threshold)
     //one = 1;
   //else if(first < threshold)
     //one = 0;
   //int second = analogRead(0);
   //if(second >= threshold)
     //two = 1;
   //else if(second < threshold)
     //two = 0;
     
     
   //if(first && !second)
     //out = (out <<1) | 0x01;
   //else if (!first && second)
     //out = (out <<1);
   flipFlop!= flipFlop;
   boolean in = 0;
   if(input >= threshold)
     in = (1 ^ flipFlop);
     
   else if(input < threshold)
     in = (0 ^ flipFlop);
   
   if(in)
     out = (out <<1) | 0x01;
   else
     out = (out <<1);
 }
 
 return out;
}

unsigned int calibrate()
{
 digitalWrite(led,HIGH);
 for(int i = 0; i < CALIBRATE_SIZE; i++)
 {
   int adc_value = analogRead(inPin);
   byte analog = (byte)adc_value;//truncates to 0-255
   Bins[analog]++;
 }
 
 //find the median
 unsigned long half;
 unsigned long total = 0;
 int i;

 for(i=0; i < 256; i++){
   total += Bins[i];
 }

 half = total >> 1;
 total = 0;
 for(i=0; i < 256; i++){
   total += Bins[i];
   if(total > half){
     break;
   }
 }
 digitalWrite(led,LOW);
 return i;
}

wanderson

#31
Jun 02, 2014, 04:18 pm Last Edit: Jun 02, 2014, 05:06 pm by wanderson Reason: 1
1) Do not filter the data you are getting from the noise circuit.  It is not needed, and it can severely affect bias.

2)  You basic mapping is flawed.  You are using a modulus operator which by itself is introducing additional bias.  To prove it to yourself, produce a fake data set with a perfect uniform distribution say 0-255, 100 times each.  Then apply your mapping using the modulus operator to that data set.  Since the input data set has a perfect uniform distribution, you would expect your mapped distribution to be perfectly equal.  It will not be.  If you look at the library code in my sig you can find code that maps the input data set to a smaller range while preserving the uniformity of the original.

3) Get down to the basics in your code.  First, just capture the pure analog readings after right shiting the analog read twice (to turn the 1024 bit read into an 8 bit value) by dropping the two least significant bit.s  Take say 50,000 samples and manually calculate the median.  For my circuits, it averages between 80-90.  YMMV, however, as long as it is above say 50 or so and you have a standard deviation of at least 10-20, you don't need any additional amplification to work with your circuit as is.

4) Only after 1-3 are done and working, should you start adding code one step at a time to first turn analog readings into bits, then bytes.  Then start adding whitening as needed to get a uniform distribution.

When testing how much whitening is needed, here is the order I add whitening.

First Von Neumann.

Then XOR on byte, then 32-bit level

If the XOR doesn't cut it, I replace the XOR step with a linear shift register.  I like the jenkins one at a time hash.

P.S.  I posted working code for both the Von Neuman and the Jenkins hashing function in http://forum.arduino.cc/index.php?topic=244024.msg1748064#msg1748064 of this thread.

acegard

Thank you. My median is always roughly 54, so I should be fine.
As for the mapping, is this your code that does it?
Code: [Select]
uint32_t EntropyClass::random(uint32_t max)
{
  uint32_t slice;

  if (max < 2)
    retVal=0;
  else
    {
      retVal = WDT_MAX_32INT;
      if (max <= WDT_MAX_8INT) // If only byte values are needed, make best use of entropy
{                      // by diving the long into four bytes and using individually
  slice = WDT_MAX_8INT / max;
  while (retVal >= max)
    retVal = random8() / slice;
}
      else if (max <= WDT_MAX_16INT) // If only word values are need, make best use of entropy
{                            // by diving the long into two words and using individually
  slice = WDT_MAX_16INT / max;
  while (retVal >= max)
    retVal = random16() / slice;
}
      else
{
  slice = WDT_MAX_32INT / max;
  while (retVal >= max)           
    retVal = random() / slice;
}                                 
    }
  return(retVal);
}


I'm not sure how to adapt it as I don't have a random "pool" of entropy as you do. (Basically to emulate the random8() etc. functions.)
Thanks for the Von Neumann and Jenkins code. I saw it, but I haven't had time to incorporate it yet.

wanderson

Yes that is the code.  In order to correctly 'slice' a random number to a smaller range you need the ability to get an additional random number so the first can be discarded if it doesn't 'slice' up properly, which occurs somewhat rarely depending upon the random size and the divisor.

So you don't need to have a pool, you just need to have your maping calls call whatever function you want to generate random numbers.  That way, if the particular 8-bit value doesn't slice properly (this is a relatively rare condition) you can try with the next 8-bit random value.  You can get away with a straight modulus operation, but only if the divisor is a power of two.


BTW, what is the variance or standard deviation on your raw analog reads (8-bit adjusted)?  A median of 56 isn't bad at all, but if the standard deviation is small you may not have enough range in your noise levels to work well.  I am concerned that may be the case since your only running 9V.  Granted my tests were with the 2N3904, but I needed at least 10V with that transistor to get decent range of noise.  In my circuits I was getting a vmin of about 40mV and a vmax of about 2.3V (I adjusted input voltage to get this range), I then used a Aref of 2.5V to maximize the initial resolution of the A2D. 

If you max voltage is below 1.1 you could use that internal reference, and maximize the resolution of the A2D.  This by itself will help eliminate some of the bias you generator is experiencing.

wanderson


Thank you. My median is always roughly 54, so I should be fine.


BTW, depending upon your definition of roughly, this could be another indicator of a problem.  Not counting the extremes my circuit has exhibited over time, the median (which I recalculate every 10,000,000 bytes) varies considerably from about 80 up to about 120.  If your not seeing a decent amount of variation in your median you probably have either a circuit problem or a software one.

acegard

I'm just coming off a morning with the oscilloscope, and I'll post more detailed analysis later when I get off work.



Thank you. My median is always roughly 54, so I should be fine.


BTW, depending upon your definition of roughly, this could be another indicator of a problem.  Not counting the extremes my circuit has exhibited over time, the median (which I recalculate every 10,000,000 bytes) varies considerably from about 80 up to about 120.  If your not seeing a decent amount of variation in your median you probably have either a circuit problem or a software one.


Without having kept detailed records of what my median is, having run it ~100 times over the past day or so, the calculated median is always between 50 and 60, inclusive. Is this a problem?

I haven't calculated the standard deviation yet, I will when I get off work. I did notice when I was working on the scope that the noise waveform actually seemed compressed at the lower end of the spectrum with my circuit - I feel like this might be what's causing my distribution to group as well. I changed some resistor values around and found a sweet spot where the waveform has not only a larger amplitude but is more randomly distributed as well. I haven't sampled it yet to see, but I have a good feeling about it!

Thank you for your patience and your exemplary assistance! You're really helping me to slowly understand every aspect of the problem. Thanks!

wanderson


Without having kept detailed records of what my median is, having run it ~100 times over the past day or so, the calculated median is always between 50 and 60, inclusive. Is this a problem?


Your seeing about a quarter of the typical variation I see in my circuits.  I suspect that this may be because your not getting enough range in either your noise or perhaps in the A2D conversion.   The next step I would do is run a program to capture about 50,000 raw a2d values.  With any luck you can either use the internal 1.1V AREF or pick up an external AREF to maximize the range your A2D is capturing the noise.  Personally, I have found adjusting the AREF to be easier to do than small amplifications (say 1.2-2.6...) to get a full range (or nearly so).

BTW, good notes can help considerably with identifying an fixing problems.


Thank you for your patience and your exemplary assistance! You're really helping me to slowly understand every aspect of the problem. Thanks!


Your welcome!

I have posted a scope capture for a sample of my 2N3904 noise circuit which you can use as a basis of comparison.  You can see the output at https://sites.google.com/site/astudyofentropy/project-definition/avalanche-noise

Using all of the whitening I apply, that circuit can generate a little over 40MB of random bytes per day.  So far one particular implementation has generated about 18Gb of random data.

acegard

A quick question as I map tonight's plan of attack: I would like to let it run overnight to get a larger sample to test against. I have an Ethernet shield with a MicroSD slot that I want to put the data on (since the Arduino can't handle a sample size of more than ~500k). I know how to read and write data to the SD card, but am curious as to the method you used to capture larger sample sizes?

wanderson

I use two methods, depending upon mood.  Most of the time I use my computer and capture the serial port output to a text file in a background process.  For the long term test device I created code that uses a micro-SD card to save each 'block' of data to the card.  The code reuses the same memory block I use for the calibration storage, thereby allowing it to compile and run on a 328 level chip.

You can find the complete code at; https://code.google.com/p/avr-hardware-random-number-generation/source/browse/Avalanche/TRNG_SD/TRNG_SD.ino

acegard

Hey there! Sorry it has taken me a while to reply; due to some unavoidable personal stuff I had to wait until tonight to work on this.

My median and standard deviation are 56 and 15, respectively, with an analog reading sample size of 500,000 samples.

Here are my oscilloscope results. I am using the same circuit as I posted before, but with no filtering for any sort of Nyquist frequency.
This capture was taken at the output, with the values given. The waveform is grouping itself, it seems, on the lower end (around 100 mV), so I thought that this could explain the grouping of my samples on the lower end of the scale. After some fiddling and calculating, I replaced the high resistor on the divider with a 6.8k ohm resistor, and the lower one with a 20k ohm resistor, and got this noise waveform. It seems to me that this is far more evenly distributed, as well as with a higher amplitude. The charts in the attached Excel sheet have been created using this circuit. Interestingly, it has not changed my median. Also, my median is rather constant, and rather low - always now between 40 and 50. This is true of both the initial circuit and the new one.
Then I changed the AnalogRef pin as you suggested, with a reference of 3V, as the maximum noise looked to be about 2.9 volts.
This changes the median to ~75.

When I run the code through whitening algorithms, the distribution looks pretty good, however, and this I feel is the real final challenge, the mapping is still horrendous. As you can see by the charts in the attached spreadsheet, the "mapped" distributions are all over the place. Why might this be?

Here is my new code:
Code: [Select]
/********************************/
/*  Rob Seward 2008-2009        */
/*  v1.0                        */
/*  4/20/2009                   */
/********************************/

#define BINS_SIZE 256
#define CALIBRATE_SIZE 10000

#define NO_BIAS_REMOVAL 0
#define EXCLUSIVE_OR 1
#define VON_NEUMANN 2

#define ASCII_BYTE 0
#define BINARY 1
#define ASCII_BOOL 2

#define LED_PIN 13
#define ADC_PIN A0

/***  Configure the RNG **************/
int bias_removal = VON_NEUMANN;
int output_format = ASCII_BOOL;
int baud_rate = 19200;
/*************************************/


unsigned int Bins[BINS_SIZE];
unsigned int Results[20];
boolean initializing = true;
unsigned int calibration_counter = 0;

boolean bufferFull = false;
byte buffer = 0;
byte threshold = 0;

unsigned long samples = 0;
unsigned long maxSamples = 170588;

void setup(){
  pinMode(LED_PIN, OUTPUT);
  Serial.begin(baud_rate);
  for (int i=0; i < BINS_SIZE; i++){
    Bins[i] = 0;
  } 
}

void loop(){

  Serial.println("Initializing...");
  Serial.println("Calibrating...");
  threshold = calibrate();
  Serial.print("Threshold: ");Serial.println(threshold);
  delay(2000);
  Serial.println("Collecting Data");
  for (int i=0; i < BINS_SIZE; i++){
    Bins[i] = 0;
  } 
  while(samples < maxSamples)
  {
    //byte rand = getRandomByte(threshold);
    //Bins[rand]++;
    uint32_t rand = map20();
    Results[rand]++;
    Serial.print(samples);Serial.print(": ");Serial.println(rand);
    samples++;
  }
 
  Serial.println("Distribution:");
  for (int i=0; i < BINS_SIZE; i++){
    Serial.print(i);Serial.print(": ");Serial.println(Bins[i]);
  } 
  for (int i=0; i < 20; i++){
    Serial.print(i);Serial.print(": ");Serial.println(Results[i]);
  }
  while(true){}
}

byte getRandomByte(byte tHold)
{
  while(!bufferFull)
  {
    int adc_value = analogRead(ADC_PIN);
    byte adc_byte = adc_value >> 2;
    processInput(adc_byte, tHold);
  }
  bufferFull = false;
  return buffer;
}

void processInput(byte adc_byte, byte threshold){
  boolean input_bool;
  input_bool = (adc_byte < threshold) ? 1 : 0;
  switch(bias_removal){
    case VON_NEUMANN:
      vonNeumann(input_bool);
      break;
    case EXCLUSIVE_OR:
      exclusiveOr(input_bool);
      break;
    case NO_BIAS_REMOVAL:
      buildByte(input_bool);
      break;
  }
}

void exclusiveOr(byte input){
  static boolean flip_flop = 0;
  flip_flop = !flip_flop;
  buildByte(flip_flop ^ input);
}

void vonNeumann(byte input){
  static int count = 1;
  static boolean previous = 0;
  static boolean flip_flop = 0;
 
  flip_flop = !flip_flop;

  if(flip_flop){
    if(input == 1 && previous == 0){
      buildByte(0);
    }
    else if (input == 0 && previous == 1){
      buildByte(1);
    }
  }
  previous = input;
}

void buildByte(boolean input){
  static int byte_counter = 0;
  static byte out = 0;

  if (input == 1){
    out = (out << 1) | 0x01;
  }
  else{
    out = (out << 1);
  }
  byte_counter++;
  byte_counter %= 8;
  if(byte_counter == 0){
    if (output_format == ASCII_BYTE) Serial.println(out, DEC);
    buffer = out;
    bufferFull = true;
    out = 0;
    return;
  }
  bufferFull = false;
}



unsigned int calibrate()
{
  digitalWrite(LED_PIN,HIGH);
  for(int i = 0; i < CALIBRATE_SIZE; i++)
  {
    int adc_value = analogRead(ADC_PIN);
    byte analog = adc_value >>2;//truncates to 0-255
    Bins[analog]++;
  }
 
  //find the median
  unsigned long half;
  unsigned long total = 0;
  int i;

  for(i=0; i < 256; i++){
    total += Bins[i];
  }

  half = total >> 1;
  total = 0;
  for(i=0; i < 256; i++){
    total += Bins[i];
    if(total > half){
      break;
    }
  }
  digitalWrite(LED_PIN,LOW);
  return i;
}

uint32_t map20()
{
  uint32_t slice;
  uint32_t retVal = 0xFFFFFFFF;

  slice = 0xFF / 20;
  while (retVal >= 20)
  {
    byte r = getRandomByte(threshold);
    retVal = r / slice;
    Bins[r]++;
  }
 
  return(retVal);
}

wanderson

I haven't had a chance to look at your mapping code in detail, but on the surface I couldn't find anything obviously wrong.  But you may want to not use uint32_t (from my code) but instead define the data types as bytes.  Since your generating only byte size randoms, my slicing code that was designed to work with 32-bit values doesn't need the larger data type.

I did use your code with one of my avalanche noise generators, and looked at the results produced.  I can confirm that your code is producing good byte value.

Chi Square on the byte distribution:
Code: [Select]

Chi-squared test for given probabilities

data:  tmp
X-squared = 254.5149, df = 255, p-value = 0.4968


but the mapped distribution is flawed.
Code: [Select]

Chi-squared test for given probabilities

data:  ob
X-squared = 42.3098, df = 19, p-value = 0.001609


So there is something wrong with your mapping code.  I can't look at it any more today, but if you don't get another solution, I will take a look at it in a day or two.

acegard

Great, thank you. I'll mess around with some things tonight and let you know what comes out.

wanderson


Great, thank you. I'll mess around with some things tonight and let you know what comes out.


It doesn't solve the problem, but there is one thing I did notice that you might want to change.  You are accumulating the distribution counts at the byte level inside the while loop of the map20 function, yet some of those values are being discarded.  You might want to consider at least for this preliminary testing stage to only count distribution of the bytes that are actually used to produce the mapped values (move it outside of the while loop).

I had looked at my records, and I don't have any tests on my mapping function that uses a max value of 20.  One possibility is that the algorithm has an issue with some numbers.

Another approach, if your really only interested in values of 0-20 is to not create random bytes, but to create random nibbles (4-bits).  Since the generator is producing uniform distributions you can generate random 4-bit values (0-31) with just as much validity as the 8-bit values you are currently, and then simply discard any random that is greater than 19.  You just throwing away a little less then a bit that way.  And since the calculations would be more efficient, it should actually be faster (even with the waste of random numbers) than the more general approach I suggested earlier. 

acegard


Another approach, if your really only interested in values of 0-20 is to not create random bytes, but to create random nibbles (4-bits).  Since the generator is producing uniform distributions you can generate random 4-bit values (0-31) with just as much validity as the 8-bit values you are currently, and then simply discard any random that is greater than 19.  You just throwing away a little less then a bit that way.  And since the calculations would be more efficient, it should actually be faster (even with the waste of random numbers) than the more general approach I suggested earlier. 


This is a great idea. In the final application, the mapping range will be variable, but its maximum will be 20. I'll run some tests when I get off work, and if it works, perhaps I could just use the same technique to just adjust the maximum value of the random number produced  - i.e., for a maximum mapping of 4, only produce a 2-bit number, or for maximum of 8, 3 bit (values would need to be 1-8 so I'd produce the 3-bit number and then add 1).

wanderson

I am very fond of the debugging approach of throwing away the parts of code that aren't working for me, and either recoding from scratch or even better trying a different approach/algorithm.

As long as your generator is producing uniform numbers, it shouldn't matter whether you use it to generate 2,3,4,8,32, or n -bit integers.  Just make sure it is producing uniform numbers, my cursory look at you sketch indicated you were only using Von Neumann whitening.  Which can work find when the circuit is first wired up and calibrated; however the components WILL drift and the balance will alter quite considerably over time.  While the median approach that Seward developed, helps that problem a lot, it only goes so far.  In my opinion RNG circuits can never use too much whitening, which is why I layer multiple techniques.  My experience indicates that 4-8 bits in for every bit output is about good for safety.


On a related note, I have an Arduino using my original mapping code to produce numbers from 0-19.  I will take a look at the distribution, and get back to you.  If the problem is my algorithm (god I hope not), I will know tomorrow, since I KNOW that that library/software has been well tested to produce uniform 32-bit random number distributions by a number of people on a number of chips.  If my sample distribution is good, then the problem is something obscure in your implementation.  If not, I will need to produce another release of my library after I figure out the cause...  (Did I say, that I hope the problem is somewhere in your implementation?)

Go Up