Problem reading random noise generator

jremington:
The author of the HRNG link you posted claims to have gone to some effort to reach his goal, namely

trying to design a circuit that could produce a symmetrical, zero-average noise signal from a reverse biased p-n junction in avalanche breakdown

but it really isn't clear how he achieved and verified the output of the hardware other than with software. Your circuit has no provision for adjustments and can't produce a zero mean signal. Also, he used a digital input to generate "1"s and "0"s -- your use of the ADC throws in an additional complication. So you may be fighting an uphill battle.

The original author of the circuit the OP is using believed that a relatively even balance between 0's and 1's produce a non-biased output (it doesn't), but they didn't appear to actually test that asuumption since they used 'whitening' techniques from the start:

This time there is no need to minimize bias inherent in the operation of the generator, if all went well we should get a probability of 1s and 0s very close to 50%. Of course I'm going to assume that the bit-stream still contain bias and correlation, so a de-skewing technique for reducing bias and correlation will be used anyway.
On the software side I have basically used the same code I had written for the Chua random generator: at the beginning GPIO4 is configured as input, then one bit is read (and its value is optionally printed), next we wait for a delay and repeat. This delay has been set to 500?s, producing a 2000bit/s stream.

Yes, I see. He ran a simple program to read two independently generated output streams from the HRNG and produce a third stream, as outlined by the following (from Random Sequence Generator based on Chua Circuit )

John von Neumann invented a simple algorithm to fix simple bias, and reduce correlation. It considers successive pairs of consecutive, non-overlapping bits from the input stream, taking one of three actions: when two successive bits are equal, they are discarded; a sequence of 1,0 becomes a 1; and a sequence of 0,1 becomes a 0. It thus represents a falling edge with a 1, and a rising edge with a 0. This eliminates simple bias, and is easy to implement. This technique works no matter how the bits have been generated. It cannot assure randomness in its output, however. What it can do is transform a biased random bit stream into an unbiased one. However, significant numbers of bits are discarded. Thanks to the hardware bias reduction, we can affirm that the probability of 1 is p ~ 1/2, and in this case the discard of input pairs is minimum, giving us an output stream that is 1/4 the length of the input on average.

Wow, it's obvious that I don't have nearly as good of a grasp of this as I thought I did! I'm going to spend the morning reading all the responses and sources and come back when I can post an informed, coherent reply!
Thank you guys for your great, informative responses.

wanderson:
Avalanche noise is generally an order of magnitude lower for 5V and below sources (typically zeners') when compared to 12V+ sources, even when current is adjusted for. Further the reverse bias transistor used in the OP example doesn't enter the breakdown zone until about 8-9V, though I have found a few examples of the 2N3904 (the transistor typically specified) that entered breakdown at lower voltages, the mean tends to be around 9V, with noise peaking near 12V.

The 'best' current for maximum noise varies with the device; however, the ones I have tested have generally been 5-50 uA.

Further, there have been several people who have tested a variety of such transistors, and the 2N3904 tend to produce the most noise.

I am limited to a voltage source of 9V for the final project, unfortunately, so I chose the 2N2222 as opposed to the 2N3904 because, even though it might not produced as much noise, it looks like it's easier to push into breakdown at lower voltages according to the example I cited in the OP. If I'm off base here, let me know.

BareMetal:
I disagree. my original circuit was stolen from a 12v design. I dropped it to 5v, changed the NPN bias to get 2.5 ish midpoint and enough gain to exceed the upper and lower hystersis points of the 393. it didnt work...until I realised that the key point is the current through the noise source is quite critical. i guess most mfrs try to reruce noise and though zeners are inherently noisy there is a point on their curve where they are most noisy. we are talking nanocurrents here, any help you can get to boost the starting point is a bonus...I rebiased the zener to get the exact datasheet current point for peak noise and it burst into life....

albeit with the nyquist sample rate problem, so im not cel3brating yet, just my 2c that 5v circuits can be made noisy with a bit of care,a datasheet and a calculator

So you're using a Zener diode to generate avalanche noise as opposed to transistor breakdown? Does your circuit look something like this, getting the noise from the zener and then using the two transistors to amplify that?

acegard:
So you're using a Zener diode to generate avalanche noise as opposed to transistor breakdown? Does your circuit look something like this, getting the noise from the zener and then using the two transistors to amplify that?

The reverse biased transistor used in the original circuit you cited, as well as my original circuit, not to mention the typical circuit floating around the web for 30+ years is behaving in the exact same manner as a zener diode. This app note might help you:

I have also made some suggested changes to your circuit. First, since you are using a capacitor to isolate the noise voltage/current from the op-amp (ac-coupling) you can use a single sided rail to rail op amp supplied by 5V to perform your later amplification and buffering. I use the MCP629x family but any similar opamp should work. BTW, you may want to try a larger capacitor, say 0.1 - 0.22 uF

I also didn't show it, but adding a buffer (similar to what you have on the first op amp following the transistor) to your voltage divider would be beneficial, particularly if you keep the resistors your using to connect the divider to the + input of the op amps.

I

acegard:
I am limited to a voltage source of 9V for the final project...

9V battery?

acegard:
Wow, it's obvious that I don't have nearly as good of a grasp of this as I thought I did! I'm going to spend the morning reading all the responses and sources and come back when I can post an informed, coherent reply!
Thank you guys for your great, informative responses.

wanderson:
Avalanche noise is generally an order of magnitude lower for 5V and below sources (typically zeners') when compared to 12V+ sources, even when current is adjusted for. Further the reverse bias transistor used in the OP example doesn't enter the breakdown zone until about 8-9V, though I have found a few examples of the 2N3904 (the transistor typically specified) that entered breakdown at lower voltages, the mean tends to be around 9V, with noise peaking near 12V.

The 'best' current for maximum noise varies with the device; however, the ones I have tested have generally been 5-50 uA.

Further, there have been several people who have tested a variety of such transistors, and the 2N3904 tend to produce the most noise.

I am limited to a voltage source of 9V for the final project, unfortunately, so I chose the 2N2222 as opposed to the 2N3904 because, even though it might not produced as much noise, it looks like it's easier to push into breakdown at lower voltages according to the example I cited in the OP. If I'm off base here, let me know.

BareMetal:
I disagree. my original circuit was stolen from a 12v design. I dropped it to 5v, changed the NPN bias to get 2.5 ish midpoint and enough gain to exceed the upper and lower hystersis points of the 393. it didnt work...until I realised that the key point is the current through the noise source is quite critical. i guess most mfrs try to reruce noise and though zeners are inherently noisy there is a point on their curve where they are most noisy. we are talking nanocurrents here, any help you can get to boost the starting point is a bonus...I rebiased the zener to get the exact datasheet current point for peak noise and it burst into life....

albeit with the nyquist sample rate problem, so im not cel3brating yet, just my 2c that 5v circuits can be made noisy with a bit of care,a datasheet and a calculator

So you're using a Zener diode to generate avalanche noise as opposed to transistor breakdown? Does your circuit look something like this, getting the noise from the zener and then using the two transistors to amplify that?

yep, but with lm393 replacing 2nd xistor stage, op pulled up to give what I hoped was a random stream of digits, which I could count 8 off at a time to give me a series of random 8bit numbers, or so I naively thought....but hey im a,relative newbie to amps n volts, so I forgave myself. it seemd like a good idea. it taught me how to bias a transistor, at least...so it was worthwhile for my hobby needs. my main aim in all this is to learn learn learn not make millions designing a better microwidget.

the "zener" was a ZXRE4041 OR 1004 cant remember which and the peak noise current was something like 7 or 8 uA I think, which is bang in line with what the other far more knowledgeable posters have said. I shall drop out now in the presence of such greatness, my 2c has been spent! these blokes know whst they are on about, im just a nosey amateur.

BareMetal:
yep, but with lm393 replacing 2nd xistor stage, op pulled up to give what I hoped was a random stream of digits, which I could count 8 off at a time to give me a series of random 8bit numbers, or so I naively thought....but hey im a,relative newbie to amps n volts, so I forgave myself. it seemd like a good idea. it taught me how to bias a transistor, at least...so it was worthwhile for my hobby needs. my main aim in all this is to learn learn learn not make millions designing a better microwidget.

the "zener" was a ZXRE4041 OR 1004 cant remember which and the peak noise current was something like 7 or 8 uA I think, which is bang in line with what the other far more knowledgeable posters have said. I shall drop out now in the presence of such greatness, my 2c has been spent! these blokes know whst they are on about, im just a nosey amateur.

Sounds like you use this circuit as a basis; https://mywebspace.wisc.edu/lnmaurer/web/minirng/minirng.html Which uses a ZXRE1004 voltage reference (kindof a zener) as the noise source. The key to that circuit is the use of a high slew rate single sided rail to rail op amp. The author used a OPA2340 (or a MCP6282), I don't think a LM393 would be up to the job. I implemented the circuit with the original OPA2340 as well as my go to MCP629x series, and both worked great. Even though the circuit produced a good balance of 1's and 0's (near 50/50) it still required a fair bit of whitening to produce a uniform distribution. Attached are two graphs. The first shows the distribution with no whitening, the second shows a two tiered whitening approach; Von Neumann followed by a Jenkins one at a time hash.

Here are the ent results for the first test; followed by the results for the second

Entropy = 7.655314 bits per byte.

Optimum compression would reduce the size
of this 22124796 byte file by 4 percent.

Chi square distribution for 22124796 samples is 12926784.31, and randomly
would exceed this value less than 0.01 percent of the times.

Arithmetic mean value of data bytes is 126.1181 (127.5 = random).
Monte Carlo value for Pi is 3.375643870 (error 7.45 percent).
Serial correlation coefficient is -0.000318 (totally uncorrelated = 0.0).
Entropy = 7.999928 bits per byte.

Optimum compression would reduce the size
of this 2679092 byte file by 0 percent.

Chi square distribution for 2679092 samples is 266.62, and randomly
would exceed this value 29.59 percent of the times.

Arithmetic mean value of data bytes is 127.5069 (127.5 = random).
Monte Carlo value for Pi is 3.143365844 (error 0.06 percent).
Serial correlation coefficient is -0.000298 (totally uncorrelated = 0.0).

Which just goes to show how nescessary whitening is for any TRNG.

Here is Arduino functions to implement the two whitening algorithms I used:

uint8_t pool[POOL_SIZE];
uint8_t pool_position;

uint32_t jenkins_one_at_a_time_hash(uint8_t *key, size_t len)
{
    uint32_t hash, i;
    for(hash = i = 0; i < len; ++i)
    {
        hash += key[i];
        hash += (hash << 10);
        hash ^= (hash >> 6);
    }
    hash += (hash << 3);
    hash ^= (hash >> 11);
    hash += (hash << 15);
    return hash;
}

// This routine takes the bit stream from the first phase of bias removal and then
// adds a second stage of bias removal by xor'ng two consequetive bytes.  
// Depending upon the current operating mode, will dictate how the resulting byte
// is processed
void buildNum(byte input)
{
  static uint8_t tmp = 0;
  static int num_counter = 0;
  uint32_t out;

  if (input == 1)
  {
    tmp = (tmp << 1) | 0x01;
  } else {
    tmp = (tmp << 1); 
  }
  num_counter++;
  num_counter %= 8;
  if(num_counter == 0)
  {
    pool_position++;
    pool[pool_position - 1] = tmp;
    if (pool_position == POOL_SIZE)
    {
      // Ok, we have filled the pool up.  Time to hash the values for an output
      out = jenkins_one_at_a_time_hash(pool, POOL_SIZE);
      Serial.println(out);
      pool_position = 0;
      tmp = 0;  
    }
  }
}

// This routine implements the von Neumann algorithm for removing bias from a bit stream
void vonNeumann(byte input)
{
  static int count = 1;
  static boolean previous = 0;
  static boolean flip_flop = 0;
  
  flip_flop = !flip_flop;

  if(flip_flop)
  {
    if(input == 1 && previous == 0)
    {
      buildNum(0);
    } else if (input == 0 && previous == 1) {
      buildNum(1); 
    }
  }
  previous = input;
}

I feed the bits to the VonNeumann function which starts the whitening process. The BuildNum function takes the preliminary whitened output from the VonNeumann function and builds a queue of 16-bytes, which are then fed to the Jenkins hash function, which produces a 32-bit (uint32_t) value which is then output as the random number. It was these processed numbers which produced the second set of results above. The last is an image of the circuit I breadboarded for testing this ZXRE1004 noise source.

yes, I think, you are right, I only had 393 somimtried to use my limited knowledge to make at least something to work, even if far from perfect. i also couldnt find his valuemfor v+, hence my cheapmhack and recalculation to 5v. with your expert advice I may get the right op amp and try again. as i said before, I did get some sort of output, and at one sweet spot which I coild never replicate, my rudimentary processing sketch produced a pretty good looking distribution.

the one big thing I learned from this "failure" is that there is apparently random (at least to a novice) and then there is Random.

I also only have A level maths from 40years ago, so figuring out just how random my random was is beyond me. I am in awe of folk that can do maths or play the piano...

your excellent contribution has reignited my curiosity...I might well rebuild and report back, but dont hold your breath!

Okay, after a long day of testing and tweaking and incorporating the suggestions given, I have some results of my own working mainly from this design as a starting point, but changing the transistors to 2N2222s and the voltage divider resistors to 10K and 5.6K, to keep it at the same relative level from 9V as opposed to 12V. Using this circuit, I compared a sketch that I wrote, emulating the code on Rob Seward's page but mainly refactoring, to Rob Seward's code. I noticed that as the sample size increased, it became apparent that the results tended to bunch toward the low end of the spectrum. I also have some specific questions, and some answers to yours:

Yes.

wanderson:
I have also made some suggested changes to your circuit. First, since you are using a capacitor to isolate the noise voltage/current from the op-amp (ac-coupling) you can use a single sided rail to rail op amp supplied by 5V to perform your later amplification and buffering. I use the MCP629x family but any similar opamp should work. BTW, you may want to try a larger capacitor, say 0.1 - 0.22 uF

I have abandoned the amplification stage for now, as this appears to be working. Tomorrow morning, the lab opens again and I'm going to go in and put everything on the scope to see what it looks like and see if I can get better performance when adding an amplification stage. What characteristics should I look for in an op-amp that would be "good" for this application?

wanderson:
The 'best' current for maximum noise varies with the device; however, the ones I have tested have generally been 5-50 uA.

Further, there have been several people who have tested a variety of such transistors, and the 2N3904 tend to produce the most noise.

Is this current you speak of the base-emitter current of the "noisy" transistor?

Grumpy_Mike:
Don't forget that the noise will have a higher frequency in the noise that you can sample at. Therefor there will be a degree of avraging going on which will limit the readings at the top and bottom end. You need to add a low pass filter to your noise source to take it below the niquis rate.

I tried this. I measured the frequency at which the Arduino was sampling the input (roughly 7.3 kHz), then designed a first-order lowpass filter (diagram attached) with unity gain and a cutoff frequency of 3.65 kHz which, unless I am wrong, is the Nyquist sampling frequency. When I connected it, however, it either drove all my samples to 0 or 1023 (reading from the analog input). Tomorrow, I am going to put it on the scope and see what is happening, but until then, are there any ideas? I am aware that the tendency of the distribution to accumulate at the lower ranges is likely due to this, but I'm not sure how to remedy it.

Attached is my Excel spreadsheet with my results, as well as diagrams of my updated circuits. The op-amp is a TL082, powered by +9V and gnd.

Edit: forgot to post my code. Here it is! I'm having trouble getting the Von Neumann algorithm to work, so all it does right now is XOR whitening.

///A sketch to sample and display the reading at analog 0 to test the generation of random noise

#define inPin A0
#define led 13
#define CALIBRATE_SIZE 30000

unsigned long time = 0;
long times = 0;
int mini = 1023;
int maxi = 0;
int threshold = 0; 
unsigned int Bins[256];
unsigned int mapped[20];
unsigned int mapped2[20];
unsigned long samples = 0;
unsigned long maxSamples = 500000;

void setup()
{
  pinMode(inPin, INPUT);
  pinMode(led,OUTPUT);
  Serial.begin(19200);
  for(int i = 0; i < 256; i++)
  {
    Bins[i] = 0;
  }
  
  for(int i = 0; i < 20; i++)
  {
    mapped[i] = 0;
    mapped2[i] = 0;
  }
    

}

void loop()
{
  Serial.println("Calibrating...");
  threshold = calibrate();
  Serial.print("Finished calibration. Threshold = ");Serial.println(threshold);
  delay(1000);
  //reset the bins
  for(int i = 0; i < 256; i++)
  {
    Bins[i] = 0;
  }
  while(!Serial.available() && samples < maxSamples)
  {
    byte result = collectData();
    Bins[result]++;
    int m = result%21;;
    mapped[m]++;
    int m2 = map(result, 0, 255, 0,19);
    mapped2[m2]++;
    //Serial.println(result);
    samples++;
    Serial.println(samples);
  }
 
  Serial.println("Full distribution:");
  for(int i = 0; i < 256; i++)
  {
    Serial.print(i+1);Serial.print(": ");Serial.println(Bins[i]);
  }
  Serial.println("Divided:");
  for(int i = 0; i < 20; i++)
  {
    Serial.print(i+1);Serial.print(": ");Serial.println(mapped[i]);
  }
  
  Serial.println("Mapped:");
  for(int i = 0; i < 20; i++)
  {
    Serial.print(i+1);Serial.print(": ");Serial.println(mapped2[i]);
  }
  Serial.print("Samples: ");
  Serial.println(samples);
  while(true)
  {}
 
}

byte collectData()//returns an 8-bit number from the random bitstream
{
  byte out = 0;
  
  for(int i = 0; i < 8; i++)
  {
    static boolean flipFlop = 0;
    //unbias with XOR whitening
    int input = analogRead(0);
    //int first = analogRead(inPin);
    //boolean one = 0;
    //boolean two = 0;
    //if(first >= threshold)
      //one = 1;
    //else if(first < threshold)
      //one = 0;
    //int second = analogRead(0);
    //if(second >= threshold)
      //two = 1;
    //else if(second < threshold)
      //two = 0;
      
      
    //if(first && !second)
      //out = (out <<1) | 0x01;
    //else if (!first && second)
      //out = (out <<1);
    flipFlop!= flipFlop;
    boolean in = 0;
    if(input >= threshold)
      in = (1 ^ flipFlop);
      
    else if(input < threshold)
      in = (0 ^ flipFlop);
    
    if(in)
      out = (out <<1) | 0x01;
    else
      out = (out <<1);
  }
  
  return out;
}

unsigned int calibrate()
{
  digitalWrite(led,HIGH);
  for(int i = 0; i < CALIBRATE_SIZE; i++)
  {
    int adc_value = analogRead(inPin);
    byte analog = (byte)adc_value;//truncates to 0-255
    Bins[analog]++;
  }
  
  //find the median
  unsigned long half;
  unsigned long total = 0;
  int i;

  for(i=0; i < 256; i++){
    total += Bins[i];
  }	

  half = total >> 1;
  total = 0;
  for(i=0; i < 256; i++){
    total += Bins[i];
    if(total > half){
      break;
    }	
  }
  digitalWrite(led,LOW);
  return i;
}

Statistics.xlsx (56 KB)

1 Like
  1. Do not filter the data you are getting from the noise circuit. It is not needed, and it can severely affect bias.

  2. You basic mapping is flawed. You are using a modulus operator which by itself is introducing additional bias. To prove it to yourself, produce a fake data set with a perfect uniform distribution say 0-255, 100 times each. Then apply your mapping using the modulus operator to that data set. Since the input data set has a perfect uniform distribution, you would expect your mapped distribution to be perfectly equal. It will not be. If you look at the library code in my sig you can find code that maps the input data set to a smaller range while preserving the uniformity of the original.

  3. Get down to the basics in your code. First, just capture the pure analog readings after right shiting the analog read twice (to turn the 1024 bit read into an 8 bit value) by dropping the two least significant bit.s Take say 50,000 samples and manually calculate the median. For my circuits, it averages between 80-90. YMMV, however, as long as it is above say 50 or so and you have a standard deviation of at least 10-20, you don't need any additional amplification to work with your circuit as is.

  4. Only after 1-3 are done and working, should you start adding code one step at a time to first turn analog readings into bits, then bytes. Then start adding whitening as needed to get a uniform distribution.

When testing how much whitening is needed, here is the order I add whitening.

First Von Neumann.

Then XOR on byte, then 32-bit level

If the XOR doesn't cut it, I replace the XOR step with a linear shift register. I like the jenkins one at a time hash.

P.S. I posted working code for both the Von Neuman and the Jenkins hashing function in Problem reading random noise generator - #29 by system - General Electronics - Arduino Forum of this thread.

Thank you. My median is always roughly 54, so I should be fine.
As for the mapping, is this your code that does it?

uint32_t EntropyClass::random(uint32_t max)
{
  uint32_t slice;

  if (max < 2)
    retVal=0;
  else
    {
      retVal = WDT_MAX_32INT;
      if (max <= WDT_MAX_8INT) // If only byte values are needed, make best use of entropy
		{                      // by diving the long into four bytes and using individually
		  slice = WDT_MAX_8INT / max;
		  while (retVal >= max)
		    retVal = random8() / slice;
		} 
      else if (max <= WDT_MAX_16INT) // If only word values are need, make best use of entropy
		{                            // by diving the long into two words and using individually
		  slice = WDT_MAX_16INT / max;
		  while (retVal >= max)
		    retVal = random16() / slice;
		} 
      else 
		{
		  slice = WDT_MAX_32INT / max;
		  while (retVal >= max)           
		    retVal = random() / slice;
		}                                 
    }
  return(retVal);
}

I'm not sure how to adapt it as I don't have a random "pool" of entropy as you do. (Basically to emulate the random8() etc. functions.)
Thanks for the Von Neumann and Jenkins code. I saw it, but I haven't had time to incorporate it yet.

Yes that is the code. In order to correctly 'slice' a random number to a smaller range you need the ability to get an additional random number so the first can be discarded if it doesn't 'slice' up properly, which occurs somewhat rarely depending upon the random size and the divisor.

So you don't need to have a pool, you just need to have your maping calls call whatever function you want to generate random numbers. That way, if the particular 8-bit value doesn't slice properly (this is a relatively rare condition) you can try with the next 8-bit random value. You can get away with a straight modulus operation, but only if the divisor is a power of two.

BTW, what is the variance or standard deviation on your raw analog reads (8-bit adjusted)? A median of 56 isn't bad at all, but if the standard deviation is small you may not have enough range in your noise levels to work well. I am concerned that may be the case since your only running 9V. Granted my tests were with the 2N3904, but I needed at least 10V with that transistor to get decent range of noise. In my circuits I was getting a vmin of about 40mV and a vmax of about 2.3V (I adjusted input voltage to get this range), I then used a Aref of 2.5V to maximize the initial resolution of the A2D.

If you max voltage is below 1.1 you could use that internal reference, and maximize the resolution of the A2D. This by itself will help eliminate some of the bias you generator is experiencing.

acegard:
Thank you. My median is always roughly 54, so I should be fine.

BTW, depending upon your definition of roughly, this could be another indicator of a problem. Not counting the extremes my circuit has exhibited over time, the median (which I recalculate every 10,000,000 bytes) varies considerably from about 80 up to about 120. If your not seeing a decent amount of variation in your median you probably have either a circuit problem or a software one.

I'm just coming off a morning with the oscilloscope, and I'll post more detailed analysis later when I get off work.

wanderson:

acegard:
Thank you. My median is always roughly 54, so I should be fine.

BTW, depending upon your definition of roughly, this could be another indicator of a problem. Not counting the extremes my circuit has exhibited over time, the median (which I recalculate every 10,000,000 bytes) varies considerably from about 80 up to about 120. If your not seeing a decent amount of variation in your median you probably have either a circuit problem or a software one.

Without having kept detailed records of what my median is, having run it ~100 times over the past day or so, the calculated median is always between 50 and 60, inclusive. Is this a problem?

I haven't calculated the standard deviation yet, I will when I get off work. I did notice when I was working on the scope that the noise waveform actually seemed compressed at the lower end of the spectrum with my circuit - I feel like this might be what's causing my distribution to group as well. I changed some resistor values around and found a sweet spot where the waveform has not only a larger amplitude but is more randomly distributed as well. I haven't sampled it yet to see, but I have a good feeling about it!

Thank you for your patience and your exemplary assistance! You're really helping me to slowly understand every aspect of the problem. Thanks!

acegard:
Without having kept detailed records of what my median is, having run it ~100 times over the past day or so, the calculated median is always between 50 and 60, inclusive. Is this a problem?

Your seeing about a quarter of the typical variation I see in my circuits. I suspect that this may be because your not getting enough range in either your noise or perhaps in the A2D conversion. The next step I would do is run a program to capture about 50,000 raw a2d values. With any luck you can either use the internal 1.1V AREF or pick up an external AREF to maximize the range your A2D is capturing the noise. Personally, I have found adjusting the AREF to be easier to do than small amplifications (say 1.2-2.6...) to get a full range (or nearly so).

BTW, good notes can help considerably with identifying an fixing problems.

acegard:
Thank you for your patience and your exemplary assistance! You're really helping me to slowly understand every aspect of the problem. Thanks!

Your welcome!

I have posted a scope capture for a sample of my 2N3904 noise circuit which you can use as a basis of comparison. You can see the output at https://sites.google.com/site/astudyofentropy/project-definition/avalanche-noise

Using all of the whitening I apply, that circuit can generate a little over 40MB of random bytes per day. So far one particular implementation has generated about 18Gb of random data.

A quick question as I map tonight's plan of attack: I would like to let it run overnight to get a larger sample to test against. I have an Ethernet shield with a MicroSD slot that I want to put the data on (since the Arduino can't handle a sample size of more than ~500k). I know how to read and write data to the SD card, but am curious as to the method you used to capture larger sample sizes?

I use two methods, depending upon mood. Most of the time I use my computer and capture the serial port output to a text file in a background process. For the long term test device I created code that uses a micro-SD card to save each 'block' of data to the card. The code reuses the same memory block I use for the calibration storage, thereby allowing it to compile and run on a 328 level chip.

You can find the complete code at; Google Code Archive - Long-term storage for Google Code Project Hosting.

Hey there! Sorry it has taken me a while to reply; due to some unavoidable personal stuff I had to wait until tonight to work on this.

My median and standard deviation are 56 and 15, respectively, with an analog reading sample size of 500,000 samples.

Here are my oscilloscope results. I am using the same circuit as I posted before, but with no filtering for any sort of Nyquist frequency.
This capture was taken at the output, with the values given. The waveform is grouping itself, it seems, on the lower end (around 100 mV), so I thought that this could explain the grouping of my samples on the lower end of the scale. After some fiddling and calculating, I replaced the high resistor on the divider with a 6.8k ohm resistor, and the lower one with a 20k ohm resistor, and got this noise waveform. It seems to me that this is far more evenly distributed, as well as with a higher amplitude. The charts in the attached Excel sheet have been created using this circuit. Interestingly, it has not changed my median. Also, my median is rather constant, and rather low - always now between 40 and 50. This is true of both the initial circuit and the new one.
Then I changed the AnalogRef pin as you suggested, with a reference of 3V, as the maximum noise looked to be about 2.9 volts.
This changes the median to ~75.

When I run the code through whitening algorithms, the distribution looks pretty good, however, and this I feel is the real final challenge, the mapping is still horrendous. As you can see by the charts in the attached spreadsheet, the "mapped" distributions are all over the place. Why might this be?

Here is my new code:

/********************************/
/*  Rob Seward 2008-2009        */
/*  v1.0                        */
/*  4/20/2009                   */
/********************************/

#define BINS_SIZE 256
#define CALIBRATE_SIZE 10000

#define NO_BIAS_REMOVAL 0
#define EXCLUSIVE_OR 1
#define VON_NEUMANN 2

#define ASCII_BYTE 0
#define BINARY 1
#define ASCII_BOOL 2

#define LED_PIN 13
#define ADC_PIN A0

/***  Configure the RNG **************/
int bias_removal = VON_NEUMANN;
int output_format = ASCII_BOOL;
int baud_rate = 19200;
/*************************************/


unsigned int Bins[BINS_SIZE];
unsigned int Results[20];
boolean initializing = true;
unsigned int calibration_counter = 0;

boolean bufferFull = false;
byte buffer = 0;
byte threshold = 0;

unsigned long samples = 0;
unsigned long maxSamples = 170588;

void setup(){
  pinMode(LED_PIN, OUTPUT);
  Serial.begin(baud_rate);
  for (int i=0; i < BINS_SIZE; i++){
    Bins[i] = 0; 
  }  
}

void loop(){

  Serial.println("Initializing...");
  Serial.println("Calibrating...");
  threshold = calibrate();
  Serial.print("Threshold: ");Serial.println(threshold);
  delay(2000);
  Serial.println("Collecting Data");
  for (int i=0; i < BINS_SIZE; i++){
    Bins[i] = 0; 
  }  
  while(samples < maxSamples)
  {
    //byte rand = getRandomByte(threshold);
    //Bins[rand]++;
    uint32_t rand = map20();
    Results[rand]++;
    Serial.print(samples);Serial.print(": ");Serial.println(rand);
    samples++;
  }
  
  Serial.println("Distribution:");
  for (int i=0; i < BINS_SIZE; i++){
    Serial.print(i);Serial.print(": ");Serial.println(Bins[i]);
  }  
  for (int i=0; i < 20; i++){
    Serial.print(i);Serial.print(": ");Serial.println(Results[i]);
  } 
  while(true){}
}

byte getRandomByte(byte tHold)
{
  while(!bufferFull)
  {
    int adc_value = analogRead(ADC_PIN);
    byte adc_byte = adc_value >> 2;
    processInput(adc_byte, tHold);
  }
  bufferFull = false;
  return buffer;
}

void processInput(byte adc_byte, byte threshold){
  boolean input_bool;
  input_bool = (adc_byte < threshold) ? 1 : 0;
  switch(bias_removal){
    case VON_NEUMANN:
      vonNeumann(input_bool); 
      break;
    case EXCLUSIVE_OR:
      exclusiveOr(input_bool);
      break;
    case NO_BIAS_REMOVAL:
      buildByte(input_bool);
      break;
  }
}

void exclusiveOr(byte input){
  static boolean flip_flop = 0;
  flip_flop = !flip_flop;
  buildByte(flip_flop ^ input);
}

void vonNeumann(byte input){
  static int count = 1;
  static boolean previous = 0;
  static boolean flip_flop = 0;
  
  flip_flop = !flip_flop;

  if(flip_flop){
    if(input == 1 && previous == 0){
      buildByte(0);
    }
    else if (input == 0 && previous == 1){
      buildByte(1); 
    }
  }
  previous = input;
}

void buildByte(boolean input){
  static int byte_counter = 0;
  static byte out = 0;

  if (input == 1){
    out = (out << 1) | 0x01;
  }
  else{
    out = (out << 1); 
  }
  byte_counter++;
  byte_counter %= 8;
  if(byte_counter == 0){
    if (output_format == ASCII_BYTE) Serial.println(out, DEC);
    buffer = out;
    bufferFull = true;
    out = 0; 
    return;
  }
  bufferFull = false;
}



unsigned int calibrate()
{
  digitalWrite(LED_PIN,HIGH);
  for(int i = 0; i < CALIBRATE_SIZE; i++)
  {
    int adc_value = analogRead(ADC_PIN);
    byte analog = adc_value >>2;//truncates to 0-255
    Bins[analog]++;
  }
  
  //find the median
  unsigned long half;
  unsigned long total = 0;
  int i;

  for(i=0; i < 256; i++){
    total += Bins[i];
  }	

  half = total >> 1;
  total = 0;
  for(i=0; i < 256; i++){
    total += Bins[i];
    if(total > half){
      break;
    }	
  }
  digitalWrite(LED_PIN,LOW);
  return i;
}

uint32_t map20()
{
  uint32_t slice;
  uint32_t retVal = 0xFFFFFFFF;

  slice = 0xFF / 20;
  while (retVal >= 20)
  {
    byte r = getRandomByte(threshold);
    retVal = r / slice;
    Bins[r]++;
  }
  
  return(retVal);
}

Stats2.xlsx (39.9 KB)

I haven't had a chance to look at your mapping code in detail, but on the surface I couldn't find anything obviously wrong. But you may want to not use uint32_t (from my code) but instead define the data types as bytes. Since your generating only byte size randoms, my slicing code that was designed to work with 32-bit values doesn't need the larger data type.

I did use your code with one of my avalanche noise generators, and looked at the results produced. I can confirm that your code is producing good byte value.

Chi Square on the byte distribution:

	Chi-squared test for given probabilities

data:  tmp
X-squared = 254.5149, df = 255, p-value = 0.4968

but the mapped distribution is flawed.

	Chi-squared test for given probabilities

data:  ob
X-squared = 42.3098, df = 19, p-value = 0.001609

So there is something wrong with your mapping code. I can't look at it any more today, but if you don't get another solution, I will take a look at it in a day or two.

Great, thank you. I'll mess around with some things tonight and let you know what comes out.