Can Von Neumann simultaneously debias more than one input channel?

Please see the attachment for all of this to make sense.

The Von Neumann randomness extraction technique reads pairs of independent bits on a channel and outputs them with any bias removed. So it might look like this:-

 do {
      a = (PIND >> PD7) & 0b1;
      b = (PIND >> PD7) & 0b1;
    } while (a == b);
  

    return a;

Here we read a single stream of randomish bits on port D, pin 7. We then output single bits with bias removed. So the input ranges 0 - 1, and the output is 0 - 1. Standard stuff.

Is it possible to debias 6 channels /pins at the same time, as shown in my diagram? Important: We would read the whole port D in one go via the PIND command. So here the input would range 0 -63 with some bias, and I expect that output to be 0 - 63 too but with bias removed. Is this possible?

vn.png

May I request someone to describe in plain punctuated language (non-native), the meaning of the post in terms of input data, modulating data, and output data.

I'm sorry if it's not clear, it's a complicated question.

Is there something specific I can explain further? Have you looked at the attachment? It shows the 6 streams of bits that I want to remove any bias from. So instead of the usual technique of sampling a single pair from 1 pin, I want to simultaneously sample all 6 using the PIND command. I then expect the unbiasing routine to output a number ranging 0 - 63 with the bias removed.

In essence I want to find a way to increase the efficiency of the Von Neumann bit extraction technique to get better than 1/2 bit output per sample pair.

I think that this is something related with Random Signal which is beyond my clear understanding.
Thank you for taking time to respond to my query.

The principle only applies to a random generator where the bits are completely uncorrelated (but biased), which
"randomish bits" are almost certainly not.

What are you actually trying to do?

CAVEATS: psuedo-random ("randomish") is absolutely not random. your high level problem requires uncorrelated data and pseudo-random numbers will fail in a way that is not apparent until someone does a statistical analysis. But we'll just assume you're doing a proof-of-concept. I also have no idea if this is even valid, to take 6 independent(?) bit streams and parallel them and try a von Neumann extraction, but IANAS (i am not a statistician) so please consult with a professional :slight_smile:

anyhow skipping the why's and wherefore's .... here's what I THINK you're trying to accomplish: you want to take two clocked sequences of 6-bit parallel inputs as numbers, compare the bit positions of each, keep all bits from the first sequence that are different from the corresponding bit positions of the second.

the method i would use is to XOR the two sequences, then AND the XOR'ed result with the first sequence, with the trick of distinguishing between discarded bits and valid zero bits. Going on that, here's some code i am going to type out on the fly that i will not compile or test, might not work, but you hopefully get the idea

int function_debias (int seq0, int seq1, int *result)
{
  int seqX = seq0 ^ seq1;      // intermediate XOR value
  int numBits = 0;             // count of valid (non-discarded bits)
  int i;                       // bit index
  
  *result = 0                  // pointer to result will accumulate

  for (i = 0; i < 6; i++)
  {
    if ((seqX >> i) % 2)       // test for valid bits
    {

      // get valid bit from seq0 then accumulate into result at the 
      // next power-of-2 place.   Note, it may be zero, and not
      // change the result, so total "numBits" will be important to save
      // note that numBits is post-incremented in the expression.
      
      *result += (((seq0 >> i) & 1) * 2**numBits++);
    }
  }

  return numBits;
}

so, the "result" iss passed back as an argument, the calling function must pass a pointer to the variable by reference. (i.e., use an ampersand (&) in front of the variable in the caller.)

the function itself returns as an integer the number of valid bits of the result. this will be key to interpreting the result that is passed back. for instance if the function returns the value 3, and the result is zero (0), then you know that there are three bits : 000. likewise if it returns the value 6 and the result is 17, then you know there are six bits: 010001

of course, there may be typos, or maybe i'm just an idiot and the whole thing is wrong, i'm merely sitting here with my coffee and typing in a forum. good luck :slight_smile:

MarkT:
The principle only applies to a random generator where the bits are completely uncorrelated (but biased), which
"randomish bits" are almost certainly not.

What are you actually trying to do?

Forgive the facetiousness, it's a character flaw and I'm sometimes overly flippant with my language :confused:

The characterisation of the bit streams as randomish is actually fair. They each contain a large proportion of pure entropy. Assume that the six streams are uncorrelated but biased, say 40% each. I'm trying to extract entropy from the six streams using the PIND instruction. That way one instruction samples all six streams in one read which is the quickest method I can think of. The Von Neumann technique traditionally is applied to a single bit stream. I want to know if there is a more exotic derivation for multiple simultaneous streams.

I suppose that this is a randomness extraction optimisation question. What is the most efficient way of debiasing six parallel bit streams? By efficient, I mean the technique that throws away the least bits. Clearly I could sequentially apply Von Neumann to each stream in turn, but that would waste the bits read via PIND of the other five channels.

doggydaddy206, sorry you've misunderstood. This is a randomness extraction question. I'm looking for a technique that accepts six randomish bits with a bias and outputs six truly random bits. I have six hardware entropy sources that I want to simultaneously debias.

I'm starting to get the impression that this is not possible :slightly_frowning_face:

Thought! Does it help to explain what I mean by bias? Consider each stream of bits. It is independent of any other stream. There is a series of 1s and 0s. The stream should be perfectly random so that:-

p1 / p0 = 1

It isn't.

I get p1 / p0 = 0.6 = randomish

This isn't too bad, but can it be cleaned up?

you got two problems here:

(1) can we create a truly unbiased (random) stream of bits

(2) how can we perform a von Neumann extraction on 6 parallel bit streams

...

lets leave (1) for a moment. it's probably outside the scope of this forum, certainly outside of my domain.

as to (2), i thought i provided an answer on how to do a von Neumann extraction using the six parallel bitstreams, where you wished to treat each incoming set of bits as a 6-bit unsigned integer with a value of 0-63, remove the repetitive bits like in the classic manner but in the parallel fashion with 6 bitstreams at a time.

did i not address that? what did i miss?

back to (1), the study of how to make a truly unbiased Random Number Generator, has been the focus of many careers. i mean, it's not a trivial problem. what you're doing seems like an interesting idea on it, and it caught my curiosity. i'm not sure if it's valid, but (again) IANAS so i'm really not the one to ask.

have you seen this? it's one of many such discussions to be sure, but many of the comments seem to be hitting on what you're after. the question and the accepted answer are trivial, but the later comments, code snippets and links are worth looking at.

Ah! I hadn't understood your code until your latest post. You mean that you've (effectively) converted the six parallel streams to a serial stream if you scan along the whole byte as read by PIND? That explains what that right shift is doing there. Now that's good!

For PCB real estate reasons, fours steams would be preferable, but six streams gives 50% extra bit rate. I wonder what the optimum number of steams is therefore since you're looking at /rejecting pairs?
Hmmm...ponder this I will.

note, in the function "seq0" is the first sample of the 6 bitstreams, sent as an integer expected from 0-63. "seq1" is the next sequence immediately after the first. this would be analogous to the first and second bits in a typical von Neumann scheme.

except, rather than comparing just two bits and either keeping only one or discarding both (ie, compressing the two bits of information into either zero or one bit), this function compares the two 6-bit sequences, and returns anywhere from 0 to 6 bits.

the information is returned in two parts. the actual bits are referenced back via the "result" pointer and expressed as a value between 0-63. the return value of the function is an integer between 0 and 6 indicating how many bits to expect in the "result". this is needed to properly catch any "leading zeroes" in the "result"

now this on average will return less than 50% of the total biased samples, but since you're pushing in 6 parallel streams, the throughput will be 3 times as much as an average single stream extraction. Maybe some CompSci guy or gal could optimize this for even more efficiency, there's certainly a few papers out there about this that i haven't read :slight_smile:

Have you seen this RNG extractor? They use a XOR mechanism, but it might be along different lines. They based their extraction rate on min.entropy which is what I should really be using as that's the safest approach for cryptography. See section 3 of the paper.

I'm starting to wonder if the amount of instruction processing might mean that it's just simpler to accept the slight bias and read more samples as compensation. I can sample the entropy at 8 MSa /s which is exactly the Arduino instruction processing rate. XOR and for /next operations run at a similar speed. So neither technique has a distinct speed advantage. Six streams would be a raw (biased) rate of 48 Mbits /s.

Hmm. Now that I see the entropy collection rate in print, I wonder if I'm just being greedy? The bottleneck will be the Ethernet shield. Hmm again.

If you're interested, this is the entropy source of a single bit stream:-

(R1 & R2 have been reduced to 100K)

i havent had a chance to read through that link yet, but i will.

im trying to run your numbers on the fly here. if the arduino is able to sample 8MS/s, the typical von Neumann scheme samples one stream and loses (on average) 75% of your samples in the discards. because for every two (2) samples, you are 50% likely* over time to get one bit of information.

so thats 2Mb/s*

you'll be sampling 6 streams as a 6bit integer at (i think) the same rate, having to make two samples to process, and losing anywhere from none (0) to all (6) of those bits any given time depending on each bit's corresponding subsequent value. over time, i think it should approach providing 3 bits* of information per every two samples, on average

or 12Mb/s*

which is still better than 1/2 a bit of information per every two sample with one bitstream. six streams gives 6x the bit throuhput that one stream does. but yeah, what you said: PCB/pinout real estate.

*NOTE: I have totally oversimplified the throughput, my "average" valid sample return of 50% assumes unbiased sources. which they won't be. so it won't be 2Mb/s vs 12Mb/s, it may be noticably less than that. but regardless, we can say whatever the bias for one source will be similar bias for the 6 sources, so running 6 parallel streams (theoretically) should be more or less 6x that of the single.

Let me repeat again the Von Neuman trick doesn't apply to real time analog signals, it only applies
to distinct, independent random variables - like ideal coin tosses (of a biased coin). You've just
said that your input contains some entropy - that's not enough.

And it cannot produce 6 bits out for 6 bits in anyway, and cannot guarantee to even terminate
come to that.

And the 6 streams will produce results at different rates, without bound in the mismatch, so you'd
need to add an unbiased way to discard values from all but the slowest channel.

If you have a source of entropy, why not mix it into a PRNG and use the PRNG to produce known
statistics of output?

The way the Von Neuman trick is typically used is to time successive decays of a radioactive source,
and compare the two times, which are independent. You have to use the next two times for the
next bit (you can't reuse them), so one bit per 2 decays detected. You also alternate the order
of the times in the comparison for successive output bits since radioactive decay rate changes
with time. Equal times mean no bit is output.

If your original values aren't independent then the von Neuman trick can easily reduce randomness
and fail to eliminate bias. Independent samples is a very strong constraint.