I did something similar to add a DMA-based SPI handler for WS2812 LEDs to FastLED on the ESP32 platform. It's a slog. I started by studying how Paul Stoffregen hooked his DMA-based WS2812 Library into FastLED. Then I stared at the FastLED source code for a long time.
Basically, you have to create a class that inherits from 'CPixelLEDController'. These would be the steps to add my ESP32 DMA SPI WS2812 Controller to FastLED:
1. Modify FastLED.h. See the two blocks I added. They're sandwiched between comments that look like this:
// ------------ GFV ------------
NEW CODE SECTION IN HERE
// ------------ GFV ------------
Here's the standard FastLED Cylon example using the DMA SPI Controller:
#include "Arduino.h"
#define USE_ESP32DMASPI
#include <FastLED.h>
const size_t numLeds = 24;
const uint8_t dataPin = 18;
CRGB leds[numLeds];
void setup() {
Serial.begin(115200);
delay(3000);
Serial.println("Starting");
FastLED.addLeds<DMASPI, dataPin, GRB>(leds, numLeds);
FastLED.setBrightness(84);
}
void fadeall() {
for (int i = 0; i < numLeds; i++) {
leds[i].nscale8(250);
}
}
void loop() {
static uint8_t hue = 0;
// First slide the led in one direction
for (int i = 0; i < numLeds; i++) {
// Set the i'th led to red
leds[i] = CHSV(hue++, 255, 255);
// Show the leds
FastLED.show();
// now that we've shown the leds, reset the i'th led to black
// leds[i] = CRGB::Black;
fadeall();
// Wait a little bit before we loop around and do it again
delay(10);
}
// Now go in the other direction.
for (int i = (numLeds) - 1; i >= 0; i--) {
// Set the i'th led to red
leds[i] = CHSV(hue++, 255, 255);
// Show the leds
FastLED.show();
// now that we've shown the leds, reset the i'th led to black
// leds[i] = CRGB::Black;
fadeall();
// Wait a little bit before we loop around and do it again
delay(10);
}
}
I did this some time ago to add a FastLED support for W80x controllers, based on DMA - I2s transfer. It is a doable task, if you basically understand how the library works.
Just start from existing FastLed renesas folder, /src/platforms/arm/renesas/. There is a ClocklessController class already. You have to add SPI and DMA initialization to the init() method, populate the pixel data buffer in writeBits() and start a DMA transfer in showRGBInternal(). Quite useful to look at the code for another controllers in /src/platforms/ folders.
Unfortunately, I can’t help in more detail, since I haven’t held the Uno R4 board in my hands yet.
Because there are many types of addressable leds, and different led types has a different timings. Some leds even needs an extra data bits for white color, so a completely different protocol there.
For most microcontrollers, the library supports output not only using SPI, but also by direct writing to the GPIO. In such cases, timings have to be set almost manually and this code is not portable between different types of leds.
What somebody wrote is what it is. Microcontrollers are different, some supports output via SPI, some do not, in addition, the code for different MCU families was written by different people and at different times, so there is no uniformity here.
Some controllers has also clockless_block controller, which is a most interesting thing, I think. The block controller allow you to output data to the several pins in parallel.
To be honest, I don't remember this in whole details...
WS2812, APA102, WS2801, etc. all have different protocols.
The driver I wrote uses 4 bits to implement the required timing for each bit of the WS2812 RGB data. The binary pattern '1000' provides the timing for a '0' bit while '1100' has the data line high long enough to be recognized as a '1' bit. So, because each data bit to the WS2812 requires 4 SPI bits to be sent, it takes 4 bytes of SPI data for each color, thus 12 bytes of SPI data for each RGB pixel. Besides the WS2812 datasheet, this is also a good reference: https://wp.josh.com/2014/05/13/ws2812-neopixels-are-not-so-finicky-once-you-get-to-know-them/
Something like that. But but probably most important is the first template argument .... LED_TYPE. For example, in the Cylon example code I posted, this:
AFAIK, the SPI/DMA implementations for WS28xx work by expanding the bits 3x or 4x into a temporary array, and then doing a pretty standard DMA-based SPI transfer of the resulting data. The expansion should be chip-independent, and the SPI transfer chip-specific but "obvious."
This is all computationally and RAM-size "expensive", and you still have to busy-wait for the DMA to complete unless you want to open a whole set of concurrency and atomicity issues (perhaps FASTLED already provides those worms? I'm not familiar with the specific implementation.)
It seems to me that the only advantage is that interrupts can remain active throughout...
If your DMA outputs the separate temporary array via SPI - what is a "concurrency and atomicity issues" could be?
You don't need to wait DMA to complete, just start it and go to the another task.
The only reason to know whether DMA finished or not is to prevent to send a new transfer while the previous still in process. But since SPI frequency is known, the transfer time can be easily predicted and the issue could be resolved just by timeout.
Well, to tell the truth, I was thinking of the RPi Pico implementation using the PIO, which doesn't make a copy of the data (AFAIK), and therefore you couldn't start changing the pattern while the previous pattern was being output. For SPI/DMA, you'd just have to block a new translation to the pattern to be DMAed, which isn't quite as bad.
I don't know DMA on Renesas well, but on STM32 & W80x managing the "transfer complete" interrupt is not mandatory. And I think that adding the need to handle interrupts to Fastled is not good.
But it is your project....
The FastLED developers felt that the processors-specific differences were great enough to warrant breaking out the implementations into platform specific directories. For ESP32, the '../FastLED\3.6.0\src\platforms\esp\32' directory contains multiple implementations using SPI, I2S, and the RMT. I simply followed that pattern with the Esp32DmaSpi option that I added.
Note that the existing SPI (fastspi_esp32) implementation is for clocked LED types (e.g. WS2801), not WS2812. The former do not require any preprocessing of the pixels to implement the required timing, as WS2812 does. So, modifying to use DMA is a non-starter for WS2812.
Nope. All that my implementation of 'showPixels()' does is preprocess the CRGB array into the expanded buffer and kick off the DMA process. So, after showPixels() returns, the user is free to modify the CRGB array to get it ready for the next update while the DMA process is taking place autonomously.
So, the only "concurrency and atomicity issues" you have are the normal ones you already have with FastLED not being thread-safe. If two different FreeRTOS tasks need to access the CRGB array, then the FastLED user is responsible for setting up the appropriate interlocking mechanism (e.g. mutex, etc). That extends to the user applying the interlock to the code that calls showPixels().
The showPixels() function in the code I posted includes a semaphore lock so it doesn't try to run multiple DMA operations on the same LED string at the same time (if the user calls showPixels() again too quickly). The interrupt that's called on completion of the DMA releases the semaphore.
Another advantage of a DMA implementation is that it makes using WS2812 LEDs compatible with operations that are implemented using interrupts such as Servos and IRRemote. Also, the PJRC Audio Library requires timer interrupts to be running (not applicable to ESP32, but Paul already created a Teensy DMA UART library for blasting out WS2812 data without disabling interrupts).
Ah. I was going to point out that FastLED supports a bunch of LEDs other than the WS2812-style, with different comm protocols. That makes a task like "add R4 support to FastLED" fraught, and a testing nightmare.