Custom WS2812 programming

Hey, I'm prototyping with a few WS2812B LEDs. For those unfamiliar, these are SMD RGB LEDs with a WS2812 IC integrated, and you can address them serially and send RGB data to each led. They are very common in individually addressable LED strips.

Over the years, many libraries have been made to control these, including FastLed and Adafruit's Neopixel, but I'm trying to make my own code for higher customizability. Transmission to these devices require lower-level methods such as AVR programming, and though I understand much of this, I'm not very well versed.

The WS2812 requires an 800KHz clock cycle on one data line, with the average period of a bit being ~1.25μs.

To send a "0", the data line must be:
0.45μs HIGH
0.8μs LOW

To send a "1", the data line must be:
0.8μs HIGH
0.45μs LOW

Each transmission is 24 bits (R,G,B) followed by a minimum 50μs LOW reset.

My question comes down to how to generate each bit. Since each HIGH and LOW period is less than a microsecond, I assume the best way to wait each period is with the assembly "nop", which lasts for 62.5ns:

__asm__("nop\n\t");

With this logic I can chain these together to get close to the required time:

13 × 62.5ns is 0.8125μs
7 × 62.5ns is 0.4375μs

So to wait 13 clock cycles I could use the code

__asm__("nop\n\t""nop\n\t""nop\n\t""nop\n\t""nop\n\t""nop\n\t""nop\n\t""nop\n\t""nop\n\t""nop\n\t""nop\n\t""nop\n\t""nop\n\t");

But I also know I have to set the data pin high or low between these periods of wait, and this may take a few cycles. What is the fastest method? I have seen others suggest

sbi(PORTD, 2);

takes 2 cycles, but how does this compare to

PORTD |= B00000100;

I can't seem to find reliable info on how long each port manipulation method takes.

Once I do know, I can account for the time the operation takes and fill in the gap with "nop" (right?).

Thanks in advance to all help.

CantSayIHave:
Over the years, many libraries have been made to control these, including FastLed and Adafruit's Neopixel, but I'm trying to make my own code for higher customizability.

Since all the source code for both is on github, why not start there?

or, look at Kevin Darrah's video on youtube... a terrific video showing code that controls leds without a library.

but... why not just add your 'custom' methods to the existing fastled library? that lib has a lot of activity on its repo, and many contributors (including LadyAda herself).