Ah, youthful optimism. Your code is NOT "done the best way possible" if "few people can understand it." Period.
Ok a example:
tmp1 = 0xFF - (1 << led);
tmp2 = 0xFF - (1 << (led + 3 > 8 ? led - 6 : led + 3));
tmp3 = 0xFF - (1 << (led + 6 > 8 ? led - 3 : led + 6));
if (led == 9)
led = 0;
writeoutput dumps tmp1, tmp2 and tmp3 out in order to three 595's. Ignore the d parameter.
Guess what it does.
Most people wouldnt have a clue yet it works and I'm pretty sure its the fastest way to do it.
I havent checked the assembly created by that yet so I havent optimized it fully.
I think its pretty good though.
My pet peeve is millis(), which ends up importing many bytes of 32bit divide function so that delay() will be closer to actual milliseconds than the 2.5% off you'd get using the native clock tick of 1.024ms.
Imho for things like that there should be two functions: one for imprecise measurements and one for precise measurements. Imprecise being the default.
That way the excess code is excluded unless specifically required.