True Random Number Generator using a PN junction

Hi folks,
I tried to build a True Random Number Generator (TRNG) based on Rob Seward's TRNG using an on-chip von Neuman entropy extractor. Arduino gathered overnight 1.88 million 8-bit numbers (octets) which you can download from here (in plain text format, size: 19MB) (also available here in .mat format for MATLAB).

The first results were encouraging, but not good enough as to pass some statistical tests like the Chi-Squared, the Kolmogorov-Smirnov or the Runs Test with 95% confidence. I split the set of measurements into blocks of 20 octets and applied an SHA-1 hash on every block to get another set of 20 octets (Note: SHA-1returns a set of 20 octets). The new set of data (of equal length with the initial one) is more random; one can tell by visual inspection of the respective histograms.

Another thing I noticed was that the information entropy increased after the post-processing of the data with the SHA-1 (2 passes). The initial entropy was 5.5459nat and after the SHA-1 it was 5.5451nat.

First of all, do you think that my source of randomness is good enough for cryptographic purposes? How can I extract more entropy out of these data? Are there any software or hardware improvements I should consider? I would appreciate any suggestions.

SHA1_TRNG.png

If you are going to use SHA1:
Since SHA1 returns exactly 20 octets, it will help if you feed it somewhat more than 20 octets' worth of data.
I notice that the von Neumann extractor throws away some pairs of bits ("00" and "11"). I would not suggest you use them directly (after all, there is a reason you are throwing them away), but if you keep a count of discarded pairs, you could append this count to the 20 octets from the generator (using the count as a 21st and maybe 22nd octet), so you will have more octets to feed your hash function. (It also might be a good idea to keep an eye on this count itself, in case it seems a bit too regular.)

An alternative to a von Neumann extractor: 2-out-of-5 code
(Note that a 3-out-of-5 code can be negated and interpreted as a 2-out-of-5 code.)
These eat bits and give you decimal digits, and yes, sometimes we want decimal digits. (Think bank account PIN numbers, credit card numbers, etc.)
I wonder how good your gadget would do at generating random decimal digits.

You could also generate random ternary digits by taking bits in groups of three and seeing which bit (1st, 2nd, or 3rd) is different from the others.

If you generate two random numbers in different bases (say, one in binary and one in ternary), you could convert them both to the same base and add them.

odometer:
If you are going to use SHA1:
I notice that the von Neumann extractor throws away some pairs of bits ("00" and "11"). I would not suggest you use them directly (after all, there is a reason you are throwing them away), but if you keep a count of discarded pairs, you could append this count to the 20 octets from the generator (using the count as a 21st and maybe 22nd octet), so you will have more octets to feed your hash function. (It also might be a good idea to keep an eye on this count itself, in case it seems a bit too regular.)

What I noticed after recording a series of raw 50,000, without any whitening preprocessing, I noticed that there is an outstanding peak of the distribution at x=170 (measurements were represented as integers in the range 0~255) and some other rather dominant peaks at 75 and 120. It seems that the raw signal doesn't follow the uniform distribution whatsoever and whitening is necessary. The maximum autocorrelation coefficient was 0.06 and the average (in absolute values) autocorrelation coefficient (excluding the first one) was 0.008, so the signal is not auto-correlated. So, what you suggest is to use the count of these occurrences (that I will otherwise discard) as a random seed to enhance the hashing?

One thing that may also be affecting my implementation is the use of analogRead(int); I read somewhere on this forum some complaints that analogRead is not very accurate, so in my case it may introduce some structured additive noise.

odometer:
You could also generate random ternary digits by taking bits in groups of three and seeing which bit (1st, 2nd, or 3rd) is different from the others.

This sounds pretty much like XOR-ing the signal with some other part of the same signal, which is also used in practice. I tried XOR-ing the signal with itself (using various sampling patterns), as well as with /dev/random, but the increase in entropy was negligible, so I considered this step superfluous.

I am reading some papers suggesting the use of 2 or more independent sources of uncertainty to achieve better results... (It seems I got into trouble with this project; I though it would be easier :slight_smile: )

Hello,

chung:
I tried to build a True Random Number Generator (TRNG) based on Rob Seward's TRNG...

I believe that circuit requires 12V to function correctly. Are you supplying the correct voltage?

...using an on-chip von Neuman entropy extractor.

"On-chip"? You have hardware that performs this function?

Are you using Rob Seward's whitening function?

Arduino gathered overnight 1.88 million 8-bit numbers (octets)

I get 1,880,156 bytes. Is that correct?

The first results were encouraging...

ENT...

Entropy = 7.999634 bits per byte.

Optimum compression would reduce the size
of this 1880156 byte file by 0 percent.

Chi square distribution for 1880156 samples is 954.44, and randomly
would exceed this value less than 0.01 percent of the times.

Arithmetic mean value of data bytes is 127.2052 (127.5 = random).
Monte Carlo value for Pi is 3.155486200 (error 0.44 percent).
Serial correlation coefficient is -0.001798 (totally uncorrelated = 0.0).

I split the set of measurements into blocks of 20 octets and applied an SHA-1 hash on every block to get another set of 20 octets (Note: SHA-1returns a set of 20 octets).

Try @wanderson's mixing function...
http://arduino.cc/forum/index.php/topic,108380.0.html

First of all, do you think that my source of randomness is good enough for cryptographic purposes?

With the right mixing, probably.

Hi, thanks a lot for the answer.

Oops, I just noticed that, no, I was supplying just 5V...

I merely meant on the chip of my Arduino (AVR2560), not on some other chip. I'm using Rob Seward's implementation of von Neumann's entropy extractor.

Yes, it's correct. I'll rerun now with the correct voltage supply and I'll let you know whether there is some substantial change in the distribution.

How did you generate there statistics? Is there some tool I can use on a MacOSX?

I'll check it out and I'll let you know about the results.

chung:
Hi, thanks a lot for the answer.

You are welcome.

[quote author=Coding Badly link=topic=161682.msg1209378#msg1209378 date=1366482417]
I believe that circuit requires 12V to function correctly. Are you supplying the correct voltage?

Oops, I just noticed that, no, I was supplying just 5V...[/quote]

It goes without saying... Make absolutely certain the output of that thing does not exceed 5V or you may damage your Arduino.

I'm using Rob Seward's implementation of von Neumann's entropy extractor.

I beleive it's flawed. My understanding is that the circuit is sensitive to everything... small fluctutions in voltage and temperature, cosmic radation, and whatever else the Universe sends your way. Which means that the centerpoint will drift about. I suggest using something similar to what the atomic decay folks use: Take two readings. If the first reading is greater than the second, output a one. If the first reading is less than the second, output a zero. If the readings are identical, output nothing.

How did you generate there statistics?

ENT. (After converting the text file to a binary stream.)

Is there some tool I can use on a MacOSX?

I believe ENT is open-source in which case you can rebuild it for the Mac.

If the stream does poorly with ENT, it has problems. If the stream does well with ENT, you need to perform further testing and/or analysis.

On the first glance seems to me the circuit's output 0/1 randomness depends on the setting the operating point of T3. Also the spectra of the T1 noise is 1/f like, not white..

Done!

After thorough shuffling and hashing using SHA-1 along the lines of the LavaRnd, ent shows that /dev/random on MacOSX totally outperforms my generator. This is the best I managed to achieve so far:

Optimum compression would reduce the size
of this 18824641 byte file by 0 percent.

Chi square distribution for 18824641 samples is 2767.81, and randomly
would exceed this value less than 0.01 percent of the times.

Arithmetic mean value of data bytes is 127.4753 (127.5 = random).
Monte Carlo value for Pi is 3.142194911 (error 0.02 percent).
Serial correlation coefficient is 0.001052 (totally uncorrelated = 0.0).

For comparison, here is what I got by analyzing /dev/random:

Entropy = 7.999990 bits per byte.

Optimum compression would reduce the size
of this 17735680 byte file by 0 percent.

Chi square distribution for 17735680 samples is 256.17, and randomly
would exceed this value 46.76 percent of the times.

Arithmetic mean value of data bytes is 127.4997 (127.5 = random).
Monte Carlo value for Pi is 3.140700135 (error 0.03 percent).
Serial correlation coefficient is 0.000075 (totally uncorrelated = 0.0).

(By the way, DieHard doesn't run on MacOSX.)

Ergo: Either my source of randomness is glitchy, or my randomness extraction and decorrelation algorithms (von Neuman, shuffling, SHA-1, shuffling again) are not good enough. Tomorrow I'll check out with an oscilloscope what sort of output my little board produces and I'll post here the results (including a Fourier spectrum of the signal). Do you think that combining multiple independent sources of information would improve the quality of my RNG?

chung:
(By the way, DieHard doesn't run on MacOSX.)

Skip diehard. Use diehard2, the NIST tests, or ... what's that other thing ... Pierre L'Ecuyer's thing ... there it is: TestU01. I gave up on NIST and TestU01. I found a plethora of serious coding problems with TestU01 and I could not get the NIST tests to compile. Bear in mind that they each test different aspects of randomness with some overlap.

Ergo: Either my source of randomness is glitchy, or my randomness extraction and decorrelation algorithms (von Neuman, shuffling, SHA-1, shuffling again) are not good enough.

Did you try the whitening strategy I described in Reply #5?

Do you think that combining multiple independent sources of information would improve the quality of my RNG?

With exclusive-or? Of course.

I would like to present some results I got after monitoring the signal that the aforementioned randomness source produces using a DS1052E Rigol 50MHz digital oscilloscope. I captured data from the signal with sampling time 4e-7 sec (400ns) for a total period of 6.5ms, that is circa 16400 samples in time. The signal looks quite fuzzy at a first glance:

One cannot say with decent certainty (>95%) that it is not autocorrelated based on its ACF (AutoCorrelation Function):

and its PACF (Partial ACF):

Finally, this is the histogram of the voltage values which looks not too different from the normal distribution; we cannot claim with statistical certainty >95% that it is a normal distribution, but for sure, it has nothing to do with a uniform distribution:

I am a bit skeptical as of whether this is a good randomness source... Should one expect a close-to-uniform distribution of values from the initial source of randomness, or hope that this will be compensated using whitening algorithms? Are there criteria with which to discard a source of randomness? Any measurement (of any electrical signal) is accompanied by noise, which in most cases can be approximated pretty well by a normal distribution. In this context, one may use practically any measurement as source of randomness...

Oh, one last thing - this is the QQ plot of the values demonstrating that the values follow a distribution not too different from the normal:

chung:
Finally, this is the histogram of the voltage values which looks not too different from the normal distribution...

The spike and two troughs are a concern. Unless they're a side effect of what you used to measure the voltage. Try making the buckets one order of magnitude less then the resolution of your measurement device. Try using something else to measure the voltage.

Should one expect a close-to-uniform distribution of values from the initial source of randomness...

Dice? Yes. Playing cards? Yes. A reversed biased transistor? No.

Arguably, the ideal source of randomness is atomic decay which is a normal distribution.

...or hope that this will be compensated using whitening algorithms?

You will be whitening.

First of all, this is great. This kind of analysis was always missing from my design...

There are several things I want to address. One is the normal distribution of the voltages, and another is a shortcoming in my calibration code, and another is a few ideas for improvement.

Distribution of the Voltages:

The way random bits are created is the app tries to find threshold at which there is an even ratio of voltages above and below the threshold. Then when arduino polls the noise source it turns voltages above the threshold into a 1 and below into a 0. Theoretically, as long as the order in which those voltages land above and below the threshold is random, the distribution shouldn't matter.

However, I would be surprised if the voltage distribution and the randomness of the resultant bits would be completely unrelated – an even distribution would probably result in better randomness.

Despite this I do think avalanche noise is worth exploring. I have heard that even measuring radioactive decay does not produce an unbiased result (Arduino Forum). (Random.org uses radio frequencies for their randomness, that's the only other inexpensive noise source I can think of, though I'm sure their technique is far more complicated than this).

This normal distribution may be a surmountable problem. Whitening, along taken with a few improvements I will talk about below, may result in high-quality bits.

Calibration inadequacy

There is a problem in my RNGs current code. There is drift in the median voltage that the Arduino uses as a threshold for turning the noise into 0s and 1s. This drift that is especially apparent over long periods of time and it may also be having an effect over short periods of time. The calibration functions I included in the code are an attempt to compensate for this. However, my calibration occurs only at startup. Thus any drift that occurs after startup is not accommodated for. When I was writing the code I could not think of a way that would perform the calibration without interrupting the stream of bits. (It's probably possible, just given time constraints I could not engineer a solution).

If you'd like to see the drift, Walter Anderson studied it over a period of 10 months:

http://code.google.com/p/avr-hardware-random-number-generation/wiki/AvalancheNoise

(Walter cites my technique, but wrote his own code. I think his code calibrates periodically instead of just at startup)

Improvement:

One idea for improvement is to calibrate the threshold continually while the chip is running. It would best if this was done in the background so the bitstream would not be interrupted. If higher-quality randomness is available just by having a properly tuned threshold, then this could improve results.

Another approach which I think is promising is to have several of these noise sources going at the same time. (Random.org gets its noise from several sources that are positioned world-wide.)

Watchdog timer technique:

All of this may be moot, however. There are some implementations that utilize the jitter arduino's watchdog timer. It requires no hardware, and seems to produce better results. (Though I think it has a lesser throughput.)

http://code.google.com/p/avr-hardware-random-number-generation/wiki/WikiAVRentropy

More links

-Rob

Moderator edit: PHP session ID removed.

Dear community,

I am also trying to reproduce the circuit made by Rob but I am also having some difficulties.

First of all, I am not using 2N3904 but BC550CG transistors. (Could be a problem?)

So the signal of the generator without the amplifying transistor looks fine to me and provides a signal like the one shown in SCR03.PNG.

As soon as I connect the signal to the Base of T3 via C1, the signal becomes the one on SCR04.PNG (The Blue is the input signal on the Base of T3 and the yellow is the output signal on the Collector of T3).

To my understanding the output signal should look like an analog noise (correct?) so I assume that there is something wrong with my circuit.

Could the BC550CG Transistors be the source of my problems?

Any help is much appreciated!

I changed all transistors to 2N3904 and now the signal looks like I would expect. (See attachment, purple: generator output, green: circuit output.

It's apparently very important to use 2N3904 transistors.

@Thuffir, thank you for the follow-up.

Great Discussion

rusew:
Improvement:

One idea for improvement is to calibrate the threshold continually while the chip is running. It would best if this was done in the background so the bitstream would not be interrupted. If higher-quality randomness is available just by having a properly tuned threshold, then this could improve results.

Another approach which I think is promising is to have several of these noise sources going at the same time. (Random.org gets its noise from several sources that are positioned world-wide.)

Watchdog timer technique:

All of this may be moot, however. There are some implementations that utilize the jitter arduino's watchdog timer. It requires no hardware, and seems to produce better results. (Though I think it has a lesser throughput.)

Google Code Archive - Long-term storage for Google Code Project Hosting.
Arduino Forum

-Rob

If XOR-ing two independent random sources would increase randomness, what about XOR-ing the output of one of these circuits with say, the randomness derived from a Watchdog timer?

Or even better, "daisy chain" several Arduino's? I think I'm going to try it :slight_smile: