I have updated the google code page with additional information on my research into the long term performance of using valanche noise to generate entropy (random numbers). In addition to adding more detailed information on the first 811 10,000,000 byte samples that were collected as of last year, I have added another 400 samples from the same circuit. This was after leaving the circuit powered off for about eight months.
The results are intriguing, as you can see from this graph of the median calculated prior to the generation of each 10,000,000 byte sample file.
The medians start out much higher than they were in the first set of samples, but after about three weeks they settled into much the same pattern that had existed in the first set of samples.
One modification, I made in Mr. Seward's idea of performing a 'calibration' by calculating a median of the first few bytes. It is these medians that are displayed in the above graph. The preliminary analysis of the entropy's performance, particularly in relation to the median calculated for that particular set indicate that no relationship exists. This would be partially explained by the additional whitening operations (von Newman, etc...) performed by the software. I suspect that two factors are in play. First, one needs only the approximate middle value to determine 1's and 0's for the whitening algorithms to produce reliable uniform randomness. Secondly, I believe that regular median calibration will help maintain the generation rate of the circuit. Since, the von Newman portion of the whitening algorithm serves to remove any consecutive bits with the same value.
If you want to look at the more detailed results from these initial studies, they are available from Google Code Archive - Long-term storage for Google Code Project Hosting. The following is available for each of the 1200+ 10,000,000 byte entropy files gathered to date.
Value Char Occurrences Fraction
0 40002017 0.5000252125
1 39997983 0.4999747875
Total: 80000000 1.0000000000
Value Char Occurrences Fraction Expectation Deviation
0 39,303 0.393030000% 39,062.50 240.5000
1 38,890 0.388900000% 39,062.50 172.5000
...
254 39,026 0.390260000% 39,062.50 36.5000
255 39,074 0.390740000% 39,062.50 11.5000
Total: 10,000,000 100.0000000% Mean = 159.1172
Entropy = 7.999982 bits per byte.
Optimum compression would reduce the size
of this 10,000,000 byte file by 0.00%
Chi square distribution for 10,000,000 samples is 249.24, and randomly
would exceed this value 58.99% percent of the time.
Arithetic mean value of data bytes is 127.5027 (127.5 = random)
Serial correlation coefficient is 0.000010 (totally uncorrelated = 0.0).
Of note is that none of the produced files have failed the ent style entropy tests since the very first few, while I was still revising the whitening algorithm... And preliminary tests using more rigourous testing procedures have yet to find any issues with the data produced. A caution for those considering using this type of circuit in any critical area. The electronics are very sensitive to construction issues, and outside noise sources. The whitening helps filter out the latter, but a strong enough outside signal source could compromise the circuit. Assembly in an EMI shielded case, and other precautions should be implemented with any critical application.