First of all, this is great. This kind of analysis was always missing from my design...
There are several things I want to address. One is the normal distribution of the voltages, and another is a shortcoming in my calibration code, and another is a few ideas for improvement.
Distribution of the Voltages:
The way random bits are created is the app tries to find threshold at which there is an even ratio of voltages above and below the threshold. Then when arduino polls the noise source it turns voltages above the threshold into a 1 and below into a 0. Theoretically, as long as the order in which those voltages land above and below the threshold is random, the distribution shouldn't matter.
However, I would be surprised if the voltage distribution and the randomness of the resultant bits would be completely unrelated – an even distribution would probably result in better randomness.
Despite this I do think avalanche noise is worth exploring. I have heard that even measuring radioactive decay does not produce an unbiased result (Arduino Forum). (Random.org uses radio frequencies for their randomness, that's the only other inexpensive noise source I can think of, though I'm sure their technique is far more complicated than this).
This normal distribution may be a surmountable problem. Whitening, along taken with a few improvements I will talk about below, may result in high-quality bits.
Calibration inadequacy
There is a problem in my RNGs current code. There is drift in the median voltage that the Arduino uses as a threshold for turning the noise into 0s and 1s. This drift that is especially apparent over long periods of time and it may also be having an effect over short periods of time. The calibration functions I included in the code are an attempt to compensate for this. However, my calibration occurs only at startup. Thus any drift that occurs after startup is not accommodated for. When I was writing the code I could not think of a way that would perform the calibration without interrupting the stream of bits. (It's probably possible, just given time constraints I could not engineer a solution).
If you'd like to see the drift, Walter Anderson studied it over a period of 10 months:
http://code.google.com/p/avr-hardware-random-number-generation/wiki/AvalancheNoise
(Walter cites my technique, but wrote his own code. I think his code calibrates periodically instead of just at startup)
Improvement:
One idea for improvement is to calibrate the threshold continually while the chip is running. It would best if this was done in the background so the bitstream would not be interrupted. If higher-quality randomness is available just by having a properly tuned threshold, then this could improve results.
Another approach which I think is promising is to have several of these noise sources going at the same time. (Random.org gets its noise from several sources that are positioned world-wide.)
Watchdog timer technique:
All of this may be moot, however. There are some implementations that utilize the jitter arduino's watchdog timer. It requires no hardware, and seems to produce better results. (Though I think it has a lesser throughput.)
http://code.google.com/p/avr-hardware-random-number-generation/wiki/WikiAVRentropy
More links
-Rob
Moderator edit: PHP session ID removed.