Well only because he is running the LEDs at less than half the rated current so there is little point in having two anyway. The fact remains putting two LEDs in parallel is just plane stupid design.
Spoken like a true engineer.

It may not be efficient to put two LEDs in parallel attached to a constant current source only capable of sourcing 20mA, but the consideration here is for aesthetics, not efficiency.
And there is some kind of reflection issue in the data lines when I have more than six modules in series, where the LEDs on the last few modules will be messed up.
Yes signal integrity problems are common with this chip as people tend to think you can string them together without considering the signals. You might be lucky but just R & Cs do not normally fix things. You need proper layout and proper buffering to the signal lines.[/quote]
I don't understand. What do you mean by proper buffering and layout? TI's datasheets make no mention of any special layout other than keeping the capacitors near the pins, and properly heat sinking, nor do they mention any need for buffering.
Also, I'm not familiar with buffering, but if by that you mean put another chip on the line, that seems like it would be way too expensive, and I don't understand why it would be necessary. If the signal successfully reached one of the modules, and the LED driver picked it up, and passed it along to the next one, why would I need to worry about signal degradation? Shouldn't each chip emit a new clean signal?
Actually though, now that I think about it, while each chip does output data to the next, the clock signal used for that data isn't passed along in that manner. That's just on its own separate clock line. So maybe the data is reaching the end of my chain just fine, but the clock signal is no longer in sync with it, or reflected signals are messing it up?
Assuming adding buffering is too expensive, what would I do if I just wanted to try to reduce reflected signals in the clock line to try to extend its reach from 6 modules to 10?