Driving large number of WS2812B LEDs - Optimize for speed by grouping

lobis · September 26, 2023, 10:16pm

I am currently working on a project that involves driving a large amount of WS2812B LEDs.

I am using strips with 60 leds each. My first prototype will use 216 strips (12960 individual pixels). Ideally I would like to drive up to 512 strips this way.

I have written a simple program using the FastLED library. Besides needing a board with enough RAM to run the program, I could make it work straight away (I only tested the 216 strips prototype). The only (and important!) problem I am facing is the delay between each update cycle (FastLED.show()) which is ~400ms. Unfortunately this is unacceptable for my application as it needs to feel responsive (50-100 ms should be enough).

There is an important feature of my design that I think could be exploited to improve the speed, but I don't know how. I don't really need to drive each individual pixel, I only need to control each invididual strip independently. If there was a way to group (via hardware or software) the 60 pixels from the strip into a single pixel for control, this would reduce the number of pixels to drive by a factor of 60 and there shouldn't be any issues. (It could also reduce the memory usage of the program, which would be nice).

An alternative solution would be to find a way to improve the driving speed and just drive all pixels individually. I don't really know how to do this or if it is possible. The only thing I could think of is to split the 216 strips into groups (lets say 6 groups) and drive them independently, either from a single microcontroller or maybe using one micro for each group and setup those as workers for a single master? (No idea how to do this, but should be doable I guess).

I am currently thinking of the following idea: flashing multiple attiny with a custom firmware that allows them to control 60 leds. Then in the main program instead of using an individually addressable led library, interface instead with the attiny chips via some protocol that allows to communicate with this many devices fast enough, maybe i2c? Does this make sense?

I would love to hear some new ideas!

Thanks,
Luis

noiasca · September 27, 2023, 5:31am

How far is the distance from the Microcontroller (which one?) to the first strip of 60 LEDs?
How far is the distance between the strips of 60 LEDs?

You could use WS2811 ICs. Amplify the output for 60 LEDs and use common RGB LED Stripes instead. With this hardware you just have to address 216/512 "pixels"

Deva_Rishi · September 27, 2023, 7:24am

This is a valid solution. It will require you to make PCB's to contain the ws2811's, their circuitry, and a mosfet for each channel.

To complicated i'd say, though not completely impossible.

That is really a lot, if you want to drive them as individual pixels, just for the buffer alone you'd need almost 40KB.

I was going to suggest to use an esp32 and use Makuna Neopixelbus with the parallel I2s method, but you will not manage with just a single board, due to the memory consuming nature of that board, still you could make that to work by making the ESP32 's into ArtNet nodes, and use 1 ESP32 as an ArtNet server, depending on the amount of boards you will need in the end, you may be able to connect them straight to the server or go through a router if the amount of nodes exceeds the maximum number of sockets which is 8 for a board.

Or you could use dotstars, APA102 strip, which could all be run from a single ESP32 or teensy with a refresh rate that will be easily fast enough for you.

Cheapest solution is what Noiasca is suggesting, Simple RGB strip, controlled by WS2811 units, which contain 3 mosfets to control the RGB strip from their individual output, that you can make yourself, or and that may be a lot less complex to do
use these which can be driven with any library like Fastled or Neopixel or Neopixelbus . These are based upon a different chip than WS2811, (P9813) so should be driven with a different protocol, but require no extra design or manufacturing.

I built WS2811 units myself as described because it makes sense not to mix different protocols and we had loads of objects with WS2811 strip in there, but that should not be a the same for you.

lobis · September 27, 2023, 7:54am

Thanks a lot @noiasca and @Deva_Rishi for your replies.

I will look into the solution you propose using the WS2811 driver board. However I already have the LED strips in hand which all use WS2812b chips, I am not sure if this would be a problem. Would I be able to control my WS2812b strips with this board? (I could not find a similar board with a WS2812b chip).

I think any solution would be affordable as the cost of the remaining components (leds strips, etc.) far exceeds these boards so cost should not be a problem.

In case the WS2811 driver boards cannot be used with my strips, I still like the option where multiple strips are grouped into nodes (9 strips into a single node) and these nodes are connected into a common bus. I would need 24 nodes for my prototype which seems reasonable. There are some other advantages in designing a pcb for the nodes I could better connect some other stuff but thats another battle.

My main problem with this last approach is that I have never done something like this. I have interfaced with some i2c devices in the past so I would use this protocol at first glance but I don't think it's designed for long distances, I would really appreciate any further discussion on this idea.

The physical layout of the devices is a grid where each strip takes around 25x25cm so each of these nodes would be roughly 80x80cm. There would be a ~1m distance between each node and the bus would span a very long total distance but it would be constrained into a grid (~3x5m).

noiasca · September 27, 2023, 8:15am

I2C is designed as bus on one PCB. It will not work reliable with longer wires.

24 Nodes for WS2812B Neopixel:
You could use a RS485 bus.
As protocol the first thing which comes in my mind is DMX. If you really need all 16 Mio colors you might need 3 DMX universes. When you limit the color to 8bit, one universe for 512 strips is enough.

Deva_Rishi · September 27, 2023, 8:35am

That is a problem, since the solution proposed was using simple 12v RGB ledstrip, and not addressable ones.

so say the total length would be about 100M +

Yep, actually that is also a simple wired option, you wouldn't need 3 Universes really if you use a receiver transmitter where 1 strip only occupies 3 channels. I would use 5v pro-mini Arduinos in that case, they are not more expensive that AtTiny85's You will need a bunch of MAX485 IC's (1 for each node) and some decent twisted pair wire like UTP,

I am getting confused as to what layout you actually want to create, how many physical pixels do you want to control from a single point, how many virtual pixels from a single point ? Can you make us a drawing.

I think it's either DMX or ArtNet (which is a network expansion of DMX really) but you will need several MCU's to do this. A universe can contain up to 170 virtual pixels, but once a MCU receives this, these virtual pixels can easily be copied to address many physical pixels. Each MCU will have limitations relating to available memory.
An ArtNet network will have the least limitations, and you should be able to control all physical pixels individually.

We haven't talk about it yet, but how are you planning to power all this ?

noiasca · September 27, 2023, 8:49am

I read

512 strips. each 60 LEDs

and the target is to address all 512 strips with one color for all 60 LEDs (on one strip).
The TO hasn't mentioned the requested color depth so far.

sterretje · September 27, 2023, 9:02am

It's my understanding that some Teensies have sufficient processing power to handle 8 strips in parallel. In theory that should cut the refresh by a factor of 8.

See
OctoWS2811 LED Library
Teensy 3.2-4.1 OctoWS2811 Adaptor

Note:
No experience with it.

b707 · September 27, 2023, 9:03am

It is quite possible. On some controllers, FastLED supports connecting several strips to different pins of the controller and updating them in parallel (not to be confused with connecting several instances of the Fastled class to different pins - in this case, updating occurs sequentially)
Parallel update is available on Due, STm32, Rasberry Pico and some others. For example, on Due you can run up to 11 parallel lines, which will reduce the update time to approximately 30-40 ms per cycle

Deva_Rishi · September 27, 2023, 9:07am

Ah yes, that is 3 universes, still then i recon artnet is the way to go.

as mentioned before Makuna Neopixelbus has a parallel I2s method that runs on an ESP32 that can update 8 strips (i heard even talk of 16 & 24) but the memory requirements for it are 5 times the pixel buffer, which is going to be almost 200Kb, that is going to be a tight fit for a single board.
Still i think a few ESP32 boards and ArtNet

lobis · September 27, 2023, 9:24am

Thanks again for the very useful replies.

Unfortunately I am at work right now and I cannot make a drawing but I will add one later once I have a more clear picture.

In summary I am working on making something like this:

This begun as a hobby project but it has become a pretty big thing now. It's a grid of led tiles. Each tile is sensitive to a person stepping on it and this can be used to create interactive experiences.

I have already done the mechanical designs, the sensor for each tile (which was quite challenging but it works now), the software etc. The current system is based on 3x3 tile modules which can be coupled together in a grid. Each tile has a sensor which is basically a mechanical switch and the matrix of sensors is read using multiple MCP23017 chips controlled via i2c (~100ms delay). This is powered by multiple beefy 5V power supplies from aliexpress that hopefully do not catch on fire.

The current system mostly works but needs some polishing. When I was working on smaller prototypes (4 3x3 tiles) it felt responsive but I did not consider how adding more tiles would add latency.

Given my current design of modules of 3x3 tiles, I think it makes sense to design a custom pcb for each module (or node) and connect it to a shared bus. If I understood your replies correctly, I could use a RS485 bus (adding a MAX485 to each module), each module would have a microcontroller that I would program to understand the DMX protocol and somehow configure each module with an individual address (either hardcoded into the mcu which would mean programming each with slightly different code, or via hardware pins), then send some message such as module number 4 turn tile 0 red, ..., tile 9 blue. I am understanding it correctly?

noiasca · September 27, 2023, 9:34am

basically yes.
I would try to give each "node" the same software/PROGMEM and fix the "ID" into the EEPROM. So you just edit the EERPOM value once and can easily update the software. On startup the software should read the ID from EEPROM.

DMX (one DMX universe) will let you address up to 512 "channels". Each Channel is one byte.
Hence my question about the requested color depth. So that we know if one "tile" will need just one channel (one byte from DMX) or up to 3 (for red green blue).

lobis · September 27, 2023, 9:36am

Makes sense, thanks a lot!

lobis · September 27, 2023, 9:49am

Interesting. If this was to work it would be by far the easiest solution. I can easily physically connect different strip groups into different pins.

As you mentioned I already tried creating a different instance of the FastLED class for each group but as you mention this does not make a different (unless you do not need to update a group in a given cycle) since the update occurs sequentially, show method is blocking.

I will do more reading, looks like Parallel Output · FastLED/FastLED Wiki · GitHub is a good starting point.

b707 · September 27, 2023, 10:01am

A few years ago we did a project with one guy to control a matrix of about 8K pixels, using parallel Fastled mode on Arduino Due.

This is a simple solution for hardware, but not for software. This mode uses fairly complex bit mathematics to convert a serial string of bits into a parallel one. But in your case, if you have the same color for the entire strip, perhaps the code can be simplified

lobis · September 27, 2023, 10:08am

Yes the entire strip would be one color (60 leds) but I cannot connect each strip to a single pin, probably either 36 or 72 strips per pin.

I'm not sure I understood your comment related to complexity of the software.

So far it seems the code is pretty straightforward (https://github.com/FastLED/FastLED/blob/master/examples/Multiple/ParallelOutputDemo/ParallelOutputDemo.ino).

Basically the same but calling addLeds different, with an argument representing a collection of pins instead of a single one.

Unfortunately I only have arduinos and esp32 boards, no teensy to test this. Looks like this should also work with esp32, but may not be as easy (Reddit - Dive into anything). It's not clear where the pin mappings are for the ESP32.

At the end of the loop, FastLED.show() is called as if it was a single strip but under the hood the signals are sent in parallel meaning much faster perfomrnace (If I understood correctly).

Deva_Rishi · September 27, 2023, 11:23am

have a look at I2s parallel example from the neopixelbus, Again memory is the main limitation. You may need a few units, but the whole thing would be way less modular than doing a DMX thing. I did a test with 680 pixels per channel and that came down to about 100KB, which was way more than i could afford on a unit that had many other things to do, and i really only needed 1 or 2 channels, so i went with the single I2s mode in the end. You have twice as many pixels, so i guess 2 x ESP32 would do the trick, and you can let them communicate amongst themselves without issue.
Still the DMX solution would be easier to implement i think.

Deva_Rishi · September 27, 2023, 11:57am

Hey hold on, you are lighting up tiles with ws2812b strip right, 60 leds per tile, there is nothing (or hardly anything) stopping you from cutting those 60 leds into 2, 4 or 6 sections (6 would be about the limit i guess) and make those section share the Din pin. and taking Dout from only one of those sections to pass the signal on to the next tile. That will significantly reduce the amount of virtual pixels you have to generate signal for and therefore increase the refresh rate. As long as you don't start using thick cables for the data cable, you can easily connect 4x Din to 1 Dout. If required you can also add a simple TTL logic gate to split the signal even better, A 74HCT04 can be used to split the signal into 5 identical signals. I think your issue is solved regardless of what MCU & library you decide to use.
I'd use the 8x parallel I2s on the esp32, 4x strands of 15 leds is a total of 12960 / 4 = 3240 / 8 channels = 405 pixels per channel, that would mean refresh rate would be more than 60 Hz, and you can connect 27 tiles per channel (405 / 15)
Wiring is going to be another challenge, the wiring from the MCU to the first tile of each channel that is, but there are ways to solve that.

camsysca · September 27, 2023, 1:16pm

Will invert the data stream, confusing the issue.
Try any quad or octal non-inverting line driver, though. I'd suggest some suitable, but the purist peanut gallery here would no doubt mock me for suggesting anything not designed in this decade - regardless of it having appropriate functionality. We'll wait for their contributions.

lobis · September 27, 2023, 6:35pm

Update:

I have updated the code to use the parallel output functionality which appears to reduce the latency significantly. I am using an esp32 dev board with 6 pins (for segments of 3x12 cells, each cell 60 pixels) for a total of 18x12x60=12960 pixels. I am using 6 pins since it's easier for me to connect each segment to a distinct pin, using more pins should improve the latency but would be more difficult to connect.

I could not test it with the actual leds but I ran some dry run tests (without anything connected). Here are my results:

With 6 segments as described:

Enabling FASTLED_ESP32_I2S: 100ms latency.
Disabling FASTLED_ESP32_I2S: 170 ms latency.

Using twice as many cells per segment (3 segments total):

Enabling FASTLED_ESP32_I2S: 165ms latency.
Disabling FASTLED_ESP32_I2S: 170 ms latency.

Using a single segment for all cells:

Enabling FASTLED_ESP32_I2S: 425 ms latency.
Disabling FASTLED_ESP32_I2S: 425 ms latency.

So far the results look coherent. I don't think I can push this solution further but I think I can live with 100 ms latency (would probably increase a bit when I add more stuff, perhaps using threads I can avoid this, we'll see).

I will do live testing and if it works with around 100 ms latency I will stick with this solution for the time being.

Thank you to everybody that replied to this post, I learned a lot from all the replies, they may come in handy at some point. I am particularly glad I was able to avoid falling into the rabbit hole of the solution that involved creating these node network. It would have been fun but also a lot of work. Thanks!

By the way do you think using a beefier mcu such as the teensy 4.1 would improve the performance? Since I would only need one or two for the whole setup (hopefully), the cost would not be a problem.

Topic		Replies	Views
How to expand pins for WS2812B? LEDs and Multiplexing	17	891	March 7, 2024
Multiple LED strips or one long strip? LEDs and Multiplexing	12	166	January 20, 2026
Addressable RGBW LED strip General Guidance	46	1458	November 6, 2025
A pulse across a led strip without me having to make larger array than the physical led strip (see bottom) LEDs and Multiplexing	37	642	February 8, 2025
FastLED multiple strips on different data pins Programming	31	8495	September 26, 2023

Driving large number of WS2812B LEDs - Optimize for speed by grouping

Related topics