Sub-microsecond pin control with inline assembly

I'm working with the Elegoo Arduino Uno R3 and WS2812B addressable led strip.

I have been using the FastLED library (GitHub - FastLED/FastLED: The FastLED library for colored LED animation on Arduino. Please direct questions/requests for help to the FastLED Reddit community: http://fastled.io/r We'd like to use github "issues" just for tracking library bugs / enhancements.), but for two reasons I don't want to keep using it: 1) I want to do it myself and 2) I want to cut out as much memory usage as I can.

To assign a color to the first LED, I need to send 24 bits (one byte per color component). As per the data sheet (WS2812B Datasheet - Parallax Inc. | DigiKey) to send a 1 I first put the output pin high for 800 nanoseconds then put it low for 450 nanoseconds. The reverse for sending a 0, 400 nanoseconds high and 850 nanoseconds low. Then leave it low for at least 50 microseconds to set the internal latch.

Since I will need to utilize inline assembly to make this work. I'm familiar with IBM/370 assembly and C#, but not as familiar with C++ or AVR assembly. So this is where I need some help.

I am looking at the instruction set (http://ww1.microchip.com/downloads/en/DeviceDoc/AVR-Instruction-Set-Manual-DS40002198A.pdf). I know that every clock cycle at 16 megahertz is 62.5 nanoseconds. So I want to use instructions that have a well defined number of cycles to execute, SBI and CLI, happen to either take 2 or 1 cycles to complete. That's not a big deal, I can use another time-keeping instruction that is only 1 cycle to allow for enough exactness to compensate for the uncertainty of whether SBI and CLI are 1 or 2 cycles and still fall within the +/- 150 nanosecond variance window. I chose MOV since I can move data from R1 to R1 without changing register values and it is 1 cycle instead of the 2 cycle NOOP.

So, with all that in mind. I concocted these two lines of inline assembly:

#define SEND_0 __asm__ ( "sbi 6,0;" "mov r1,r1;" "mov r1,r1;" "mov r1,r1;" "mov r1,r1;" "cbi 6,0;" "mov r1,r1;" "mov r1,r1;" "mov r1,r1;" "mov r1,r1;" "mov r1,r1;" "mov r1,r1;" "mov r1,r1;" "mov r1,r1;" "mov r1,r1;" "mov r1,r1;" "mov r1,r1;" );
#define SEND_1 __asm__ ( "sbi 6,0;" "mov r1,r1;" "mov r1,r1;" "mov r1,r1;" "mov r1,r1;" "mov r1,r1;" "mov r1,r1;" "mov r1,r1;" "mov r1,r1;" "mov r1,r1;" "mov r1,r1;" "cbi 6,0;" "mov r1,r1;" "mov r1,r1;" "mov r1,r1;" "mov r1,r1;" "mov r1,r1;" );

But when I write "SEND_1" 24 times in a row on the setup() call, the first LED does not turn white. What am I doing wrong with my timing? Is there an issue with my code?

Thanks for your time.

I know it's a difficult fact to swallow, but a modern C++ compiler can generate code that is more, or at least as, efficient as a very good assembly language programmer, 99.9% of the time.

BTW I've played with custom NeoPixel code, on several processors now, including PIC. It's extremely difficult to achieve the tight timing margins. I don't wish to discourage your efforts, but really you should consider the gain vs. the huge effort you will have to put in.

The "fun" really begins when you try to drive more than one pixel. Trust me. :slight_smile: That is because most Neo's won't tolerate even the smallest delay between control frames.

Also, inline assembly sucks. Consult the AVR GCC documentation to learn how to link assembly language files.

Don't forget that interrupts can screw up your timing.

aarg:
I know it's a difficult fact to swallow, but a modern C++ compiler can generate code that is more, or at least as, efficient as a very good assembly language programmer, 99.9% of the time.

Please don't misunderstand. There is no pride on the line here. I believe assembly is only required because the native calls given by Arduino are listed at taking 50 clock cycles - far too slow.

I have implemented a menu system presented through an I2C-driven OLED. This menu system has eaten up all available memory. So I am in the process of re-writing it and cutting the fat wherever I can. Then assessing whether I need more memory or if I need to cut back my expectations for my current project.

Yes, you are right, I overlooked that fact. There is no way it could be written only in C. The Adafruit bit banging code for example, is written in assembly. I haven't seen the FastLed code but I imagine it must be as well. My question would be, what is wrong with just using the Adafruit or FastLed libraries? You know, the compiler does not emit code for any source that is not actually used? So if you limit your library usage to primitive functions, you shouldn't have a problem. To save memory, you won't have any luck re-writing the bit banging part. Many smart people have already hit their heads against that one for several years now. You would have to code the high level stuff.

If you genuinely only have a memory issue, please post your code here and I'm sure the good folks will find dozens of ways you could free some up.

aarg:
My question would be, what is wrong with just using the Adafruit or FastLed libraries? You know, the compiler does not emit code for any source that is not actually used?

Well, do I feel like a right idiot. I took another look at the FastLED library and indeed, everything is defined using "#define" instead of using constants. And the only reason objects are used is to implement a form of the Strategy Design Pattern. So if the constants are pre-processor and if only one object is used, then it avoids extra data. In C#, what I'm used to, object definitions that are not necessarily used are still loaded into the runtime environment and compiled on demand which still uses the memory.

I guess there's nothing wrong with using it. I'm just stupid. Thanks for your help!!!!

Why not just use the Adafruit neoPixel library? Then you can do whatever you want, and they do the heavy lifting of writing your color desires out the wire. As it were..

-jim lee

You could also consider switching to APA102 type LED strips - they don’t have the timing requirements of the WS2812B (which tend to make everything else you want to do more complex due to the pressure they put on your arduino) but are more expensive though.

Or you can use PWM to drive the NeoPixel.

The timing required is basically a 800 kHz PWM with a 33% (bit low) and 66% (bit high) duty cycles. You can use the timer overflow to update the duty cycle for the next bit. I've had success with an xmega MCU (the atmegas big brother I suppose) running at 32 MHz.