The reason why is that that 16 bit or 16384 combinations are traveled though in defined pathways where the algorithm visits any particular number maybe a number of times, but depending on where it is in the algorithm sequence will certainly branch off to non sequential (pseudo random) numbers. So while there may only be 16 bits and hence 16384 combinations, if there is a distinct 'pathway' but a secret pathway, sequencing through the 16384 points then it makes it much harder to guess what sequence point is next if only one point is 'sniffed'.
I don't think now that these remotes do rolling codes (or their code would change on each key press and would not be static as you have shown), but to use my experience with the Oregon Scientific Weather Station, the sensors send out a signal that is nibble (4bit) based, but each nibble has the significance of the bits reversed. So the Tx sends a 4 bit number where the ABCD is normally decimal weighted 1,2,4,8 whereas when this is reversed for each nibble where we swap the significance to DCBA and we weight the numbers as 8,4,2,1 suddenly the bytes can be seen to represent the data on the LCD console.
So once the nibble boundaries are determined, and the reversal of significance is applied to each byte, the weather data is observable. Numbers become recognisable in either BCD (eg humidity) or pure binary +conversion maths (eg windspeed). Until this processing is applied the bit pattern is just a meaningless jumble. (I would like to give kudos to whoever worked this out for my weather station, but I just don't know exactly who figured it out!!!) Plus the Checksum works on nibbles not bytes.
The other possibility is that there is also a checksum or cyclic redundancy involved where the integrity of the bit pattern is tested for by the receiver. I think one of the contributors above said that sending a single burst of a bit pattern was sufficient to trigger a response. If that is the case then my previous comments about repeating 4 times to validate it are wrong, and the burst of 4 times would be just hoping that at least one got through (am I right here?). But it also points to some other means of validating a successful transmission occurring so that erroneous transmissions can be eliminated. Fairly important when switching things connected to the mains power!!!
It maybe the case that the 'curious' bit satisfies some validity checking algorithm and does not have much else to do with the switching data. So while you are cloning transmitters successfully, it maybe as much as you can do to manually determine codes for each separate one purchased. So the learning mode maybe just something put in place to assign an earlier factory randomisation of transmitters, ie to create the uniqueness of the association of the Tx's to multiple Rx's, with no other generality involved. That is each Tx is sent out with a "unique" random 16bit number, or what ever number of bits, that does not change in a transmission, but they take the chance that the very close neighbours don't have the same number??? So for 16bits by 16bits = 1 in 268,435,456 chance of sharing the same number??? Reasonable odds in a capitalistic world??
However if there is a checksum or CRC of say 4 bits then that reduces the number of combinations of 16 bits being checked to 12 (4096 combos? chance of neighbours sharing =16,777,216). There is no free lunch, but still pretty good.
I hope this helps and stimulates some brilliant inspiration on someone else's part!!
Rob