That part bothers me. Truly random data occasionally has long runs of low values and long runs of high values. If the median is determined during such a run then all the rest of the data is skewed.
Yes long runs of similar values are indeed part of a truly random stream; however, all physical chaotic (random) sources have bias. In other words they have tendencies to produce more 1's or 0's. This bias doesn't change the underlying randomness of the phenomena, but it does prevent a generator that uses that phenomena from producing uniform
sequences of random numbers, which is what we tend to want. The running median that he used, is simply a method of reducing the bias, and it appears to work, though as I said earlier, both the circuit and the software required tweaking to get the degree of bias removal that I wanted to see. So in essence the median doesn't skew the data, it helps remove the skew that is built into the phenomena.
I also don't like the fact that he doesn't include some statistical analysis to back his use of the first 10 seconds. Is that long enough? What is the probability that a statistical error will occur using 10 seconds?
Well, he published what he wanted to and we are all free to use it or not, which includes taking it and running the analysis ourselves. Indeed that is something that needs to be done with any RNG used for critical applications, since they all have quirks, that could effect their applicability to a particular use and hardware based ones are particularly prone to that. And when one considers the relative slowness of these types of generators, some (if not most) of the tests out there which were designed and intended for deterministic random number generators (PRNG's) I am not sure how relevant they are. For most purposes, a hardware based random number generator has one simple criteria, is the data stream predictable? If it is then it fails, which is why bias removal is so important. It is predictability that differentiates a true random number generator from a psuedo random number generator which is inherently predictable.
From what I have generated so far, the 50,000 samples he uses seems to be more then sufficient. Indeed, when considered from the view that it is simply trying to remove bias from the circuit that is caused by junction changes or possibly beta changes then it is almost certainly sufficient since we accomplish much the same thing when constructing the circuit and making one
measurment with an oscilliscope or meter.
Nice. ENT is an excellent first filter. If it does not do well with ENT, there is no way it will pass other tests.
Ent, like the other common tests, are simply tools. Too many folks apply those tests without understanding them. Just because a generator passes (or fails) a particular test doesn't mean the generator is bad for all purposes, just some (and those are usually not common). The classic random number generators, coin flips and dice rolls, will fail most of the modern tests as well, even when using "fair" coins and dice. Part of the reason is that even "fair" coins and dice will show bias when examined over hundreds of thousands (or millions) of samples as these modern tests demand. I recall a paper where they tested a number of dice (all "fair") and discovered that they tended to become heavily biased over time.