Problem reading random noise generator

Hello,
I am trying to make a HRNG (Hardware random number generator) and I'm using avalanche breakdown noise as the source of my random noise: Link
Here is my circuit:
Image Link
In order to test the distribution, I'm sampling it with the Arduino (an Uno), however, my distribution is skewed. I don't think it's the fault of my noise generator, because according to my University's scope, it looks pretty good:
Image Link
The yellow signal is the noise at the collector-emitter junction, and the green is it amplified (At the output of the second op-amp) and biased around 4.5V as virtual ground. Since I took that scope capture, I have changed the gain of the amplifier to give me roughly 3Vpp as opposed to the indicated 105 mVpp, but the signal should look the same. I then pass it through a voltage divider so that the Arduino can read it (to push it down from a 0-9V range to a 0-5V range) and capture it at Analog Out. I then sample it and map it from 1-20 (since that's what I need eventually anyway). However, my distribution is skewed:
Image Link
It seems to be clumped on the high and low ends of the spectrum, with 20 not even being mapped to. Any ideas why this could be? What I really want to do is show the entire distribution (from 0 to 1023) but the Arduino can't hold an array that large. Also, after some testing, it seems that my noise only ranges from 140 to 890 on the ADC's scale, so that's what I've mapped them by.
Here is my code:

///A sketch to sample and display the reading at analog 0 to test the generation of random noise

#define inPin A0

int distribution1 [20]; 
int distribution2 [20];
long times = 0;
int mini = 500;
int maxi = 500;
void setup()
{
  pinMode(inPin, INPUT);
  Serial.begin(9600);
  randomSeed(analogRead(1));
}

void loop()
{
  while(!Serial.available())
  {
    int inputLevel = analogRead(inPin);
    int mapped = map(inputLevel, 148,881,1,21);
    distribution1[mapped - 1]++;
    Serial.println(inputLevel
    );
    if(inputLevel > maxi)
      maxi = inputLevel;
      
    if(inputLevel<mini)
      mini = inputLevel;
      
      times ++;
  }
  Serial.print(times);Serial.println(" Samples taken");
  Serial.println("With noise:");
  for(int i = 0; i < 20; i++)
  {
    Serial.print(i+1);Serial.print(": ");Serial.println(distribution1[i]);
    
  }
  Serial.print("Minimum: ");Serial.println(mini);
  Serial.print("Maximum: ");Serial.println(maxi);
  
  for(long i = 0; i < times; i++)
  {
    int rand = random(1,21);
    distribution2[rand-1]++;
  }
  
  Serial.println("Built in:");
  for(int i = 0; i < 20; i++)
  {
    Serial.print(i+1);Serial.print(": ");Serial.println(distribution2[i]);
    
  }
  
  while(true)
  {}
 
}

Do you know for sure what the map() function actually does, in particular how does it handle numbers that are out of the specified input range? Try some other means of generating the numbers 0-20 from the input. For example, you could try using the remainder function, newnumber = analogRead(inPin)%21;

Edit: here is the actual code for the map function. Clearly, there is a problem using the output as an array index, as the result can go out of the specified output range.

For the mathematically inclined, here's the whole function

long map(long x, long in_min, long in_max, long out_min, long out_max)
{
  return (x - in_min) * (out_max - out_min) / (in_max - in_min) + out_min;
}

140 to 890 on the ADC's scale

Thats exactly what I'd expect from 3Vpp input. Full range 5 Vpp.

Magician:

140 to 890 on the ADC's scale

Thats exactly what I'd expect from 3Vpp input. Full range 5 Vpp.

I'm not sure what you're trying to say here. Yes, this is what I'm getting and this is what I've planned for. My problem stems, I think, from the mapping, but I'm not super sure.

jremington:
Do you know for sure what the map() function actually does, in particular how does it handle numbers that are out of the specified input range? Try some other means of generating the numbers 0-20 from the input. For example, you could try using the remainder function, newnumber = analogRead(inPin)%21;

Thanks for pointing that out to me. The map() function doesn't look like it will work. I tried your first suggestion with the modulo operator, and by clamping my results so that they don't go out of range:

if(newNumber <=19)
      distribution1[newNumber]++;

My distribution his still uneven:

1: 824
2: 697
3: 651
4: 633
5: 577
6: 577
7: 590
8: 591
9: 599
10: 650
11: 783
12: 782
13: 551
14: 428
15: 405
16: 345
17: 396
18: 416
19: 465
20: 1030

I'm honestly not sure if this is a hardware or software problem.

The bumps at 1 and 20 suggest that it is a software problem. The rest of the distribution looks pretty uniform. You just haven't yet found the appropriate algorithm for using the hardware to select truly random numbers. You might take a more careful look at what the author of this page Random Sequence Generator based on Avalanche Noise did to select streams of random 0s and 1s, which would be easy to map into any range you like.

As a first try, I suggested using the % operator, but many people recommend against using this method to reduce the range, as it can introduce bias. See this discussion c - Generate a random number within range? - Stack Overflow

Edit: I note that in the original posted code, you do not initialize the two distribution arrays to zero. This is not a good idea, as some compilers (I haven't checked what the Arduino version of gcc does) just allocate memory space for an uninitialized variable, and that space almost certainly has some numbers already in it.

I certainly am going to try that, but I want to see first what the output looks like on the scope, and I can't get back into the university lab until it reopens on Monday. In the interim, do you know what might be causing the software issue?

The stackoverflow discussion I linked above suggests that you could try using the following function to limit the range of the hardware random number generator. You will need to modify it slightly (so as not to use the built in RAND_MAX and rand() function), but the general approach should work.

int rand_lim(int limit) {
/* 
  return a random number between 0 and limit inclusive.
 */

    int divisor = RAND_MAX/(limit+1);
    int retval;

    do { 
        retval = rand() / divisor;
    } while (retval > limit);

    return retval;
}

I used it to generate 100,000 integers between 0 and 19 and got the following uniform distribution

0 4976
1 4921
2 4980
3 5104
4 4862
5 4881
6 4970
7 5003
8 5064
9 5018
10 4930
11 5037
12 5205
13 5046
14 4906
15 4902
16 5010
17 5192
18 5005
19 4988

Interesting, I don't know why I didn't see the stackoverflow link earlier, my bad.
What does RAND_MAX do? If I'm going to replace it, I'll need to know haha. Also, would you suggest replacing rand() with my noise generator?

Also, initializing the arrays to zero didn't help. Of course, that's probably because I reset the sketch between each run, and (correct me if I'm wrong) the memory is initialized to zero.

RAND_MAX is the maximum number that the built in C/C++ function rand() outputs.

It may be that the Arduino gcc compiler initializes variable memory to zero, but many others don't. You should get into the habit of initializing ALL variables before using them.

You're right, I should, thank you. Should I replace rand() with the noise generator?

Don't forget that the noise will have a higher frequency in the noise that you can sample at. Therefor there will be a degree of avraging going on which will limit the readings at the top and bottom end. You need to add a low pass filter to your noise source to take it below the niquis rate.

If that's happening, why do they seem to be grouping at the top and bottom ends?

An observation: most (maybe all) of the circuits I have seen like that use 12V or higher (including the one in your link which calls for 13V). I vaguely recall one author claiming that 12V is the minimum necessary for the circuit to work correctly. But, that could easily be because of the transistors he was using.

If you want continued help with the software I suggest you post the current version of your sketch.

This may give you some troubleshooting ideas...
http://forum.arduino.cc/index.php?topic=161682.0

Thank you, Coding Badly, that thread will be very helpful. I'll post my sketch tomorrow when I get back to it. This is giving me a ton of ammunition with which to improve my design!

acegard:
If that's happening, why do they seem to be grouping at the top and bottom ends?

Because the signal is moving faster in the middle of the transition, just think of a sin wave the fastest movement is at the zero crossing poi nt. those are just the levels you will miss.

Grumpy_Mike:
Don't forget that the noise will have a higher frequency in the noise that you can sample at. Therefor there will be a degree of avraging going on which will limit the readings at the top and bottom end. You need to add a low pass filter to your noise source to take it below the niquis rate.

I tried something similar a while ago and got very similar results...a disproportionate binfull of 0s and 255s in my case....

im sure its a sampling frequency/nyquist issue. in my case I was cleaning up the signal with an lm393...I had biased my zener +NPN amp to be abt 2.5 midpoint and fed that plus a pot.tweaked Vcc/2 to the 393.

even tho my scope is ancient and slow, I could see that if I shifted the vcc midpoint up or down slightly in the wide blur of random fuzz, even tho it was still sort of middle.ish my distribution would vary widely...i did actually get a fairly flat distribution at one point, but I could never repeat it. I concluded I was being thwarted by the slew rate of the 393, or some kind of "beat frequency" between a less than nyquist rate on the max 393 slew rate...

my knowledge and scope were well below required level at the time, so I took the easy solution..I gave up!

ps they probably still are, but at least I can send you some empathy, if not fix your entropy....

Just to reinforce what Coding Badly pointed out, I made a circuit based on that type of design using 2n2222 and I found that a supply of 5V and 9V didn't work well. It needed at least 12V.

Avalanche noise circuits have several issues as random number generators. The first, is that they exhibit a fair degree of bias, regardless with how you capture the noise (ADC or comparator).

My experiments have indicated that two stages of 'whitening' are required to remove the bias from the readings. I use a Von Neumann filter followed by either some form of linear feed back shift register (LFSR). In general I have found that it requires at least a 4:1 whitening ratio to get a reasonably uniform RNG. For LFSR, I have used either a simple XOR'ing of consecutive bytes (which worked for one example circuit with A2D) to a Jenkings one-at-a-time hash on my second implementation which required more whitening due to a different noise source (5V zener source, no 10-15V supply needed).

The second issue with avalanche noise random number generator circuits is that the balance of 1's to 0's will change over time. If your code doesn't accomodate such changes your RNG will stop producing uniform RN when the bias exceeds your whitening technique's ability to cope. As you can see from the pages below, I noticed LARGE changes in the circuits balance point as it aged.

If your interested in my experiments, which include the code I derived from Rob Sewards median calculation approach are available from

https://code.google.com/p/avr-hardware-random-number-generation/wiki/AvalancheNoise

The author of the HRNG link you posted claims to have gone to some effort to reach his goal, namely

trying to design a circuit that could produce a symmetrical, zero-average noise signal from a reverse biased p-n junction in avalanche breakdown

but it really isn't clear how he achieved and verified the output of the hardware other than with software. Your circuit has no provision for adjustments and can't produce a zero mean signal. Also, he used a digital input to generate "1"s and "0"s -- your use of the ADC throws in an additional complication. So you may be fighting an uphill battle.

I don't mean to be discouraging! These things are fun and challenging.

As an aside, it is interesting to listen to various different transistors in avalanche breakdown. You can do this by coupling the output transistor via a capacitor to the high gain phono input of an audio amplifier. There are a few different processes that lead to breakdown, and these manifest themselves as clicks, pops and hisses of different apparent pitches.