Massive Parallel SPI

Hello,

I've got a bit less than a month left to finish off a project I'm working on - an array of 24 five-meter addressable LED strips (LPD8806-based, like the adafruit model). We've finally gotten to the part where we need to make the controller, but we're a bit stuck on how to implement it. We need 24 parallel SPI ports pumping out (24 bits * 32 LEDs * 5 meters * 30 FPS = 115,200 bits per second (each!) We're having a hard time finding a microcontroller that can store even one frame for one strip in RAM, let alone pull it in from USB fast enough.

So far we've looked at:
Arduino (what we're most familiar with)
Teensy
Maple (might work for 8 parallel SPIs?)

And going in the realm of "oh god, where to even begin":
ARM
FPGA
FTDI USB-SPI chips

Is there anything I might be missing? Has anyone worked on a similar problem and could offer some guidance? Thanks!

Why in parallel?

Do you really need 24 bit color depth?

Do you need the LEDs to be individually colored or can you use a color palette?

Thanks for the reply!

Sorry, wasn't clear, because the strips are arranged in a radial pattern, and it would be immensely less work on the setup-front to have them all fed from the center.

I don't believe there is any other way. Unless I've missed something, the LPD8806 takes 8 bits per channel per LED.

I'm not sure what you mean by a color pallet. I believe the LPD8806 takes 24 bits per LED regardless?

Do you really need 24 bit color depth? ... I don't believe there is any other way. Unless I've missed something, the LPD8806 takes 8 bits per channel per LED.

Do you need the LEDs to be individually colored or can you use a color palette? ... I'm not sure what you mean by a color pallet. I believe the LPD8806 takes 24 bits per LED regardless?

Reduce the storage requirements. At the point the data is sent it would have to be "expanded" to 24 bits.

(24 bits/(RGB LED) * 32 RGB LEDs/meter * 5 meters)/frame * 30 FPS = 115,200 bps.

You have 32 RGB LEDs per meter? ~1/inch, okay, guess that makes sense.

Gonna need one big power supply too.

Can you increase the interface speed? The IDE will do 250,000. Not sure how efficient your code is to parse the data across the streams.
Sounds like 16 MHz 8-bit processor might bea tad underpowered.

Sorry, wasn't clear, because the strips are arranged in a radial pattern, and it would be immensely less work on the setup-front to have them all fed from the center.

Isn't it just a question of running one line from one digital pin to the select pin on each LPD8806? Or are you trying to position one microcontroller close to each strip?

CrossRoads:
(24 bits/(RGB LED) * 32 RGB LEDs/meter * 5 meters)/frame * 30 FPS = 115,200 bps.

You have 32 RGB LEDs per meter? ~1/inch, okay, guess that makes sense.

Gonna need one big power supply too.

Can you increase the interface speed? The IDE will do 250,000. Not sure how efficient your code is to parse the data across the streams.
Sounds like 16 MHz 8-bit processor might bea tad underpowered.

Right, 115,200 bps per strip * 24. I'm not sure I could get an atmega to work, even it I had one per strip.

Trying to position the controller(s) close to the strips. I suppose it would be feasible to use one SPI port and 24 select lines, but the SPI port would need to run at nearly 3 megabits.

I don't think it really matters how it is arranged, he needs 115,200 BPS to 24 strings, so 2,764,800 BPS total.
Using a bigger processor, could have the 24 slave select lines generated internally.
Need a message to start up & get in sync, then a tight loop going:

void loop(){
/ do some serial data buffering, maybe 32 bytes, then let the data start ripping out:

PORT = PINC & B11111110;
SPI.transfer (Serial.read());
SPI.transfer (Serial.read());
SPI.transfer (Serial.read());
:
32 times
:
SPI.transfer (Serial.read());
PORT = PINC | B00000001;

// maybe do some data buffering again, then
// next port
PORT = PINC & B11111101;
SPI.transfer (Serial.read());
SPI.transfer (Serial.read());
SPI.transfer (Serial.read());
:
32 times
:
SPI.transfer (Serial.read());
PORT = PINC | B00000010;
// next port

// continue with 2nd port, then 3rd port, until hit all 24 strings
}

mhenstell:
Trying to position the controller(s) close to the strips. I suppose it would be feasible to use one SPI port and 24 select lines, but the SPI port would need to run at nearly 3 megabits.

That should be theoretically possible right? The SPI runs at a maximum frequency of system clock/2, so that's 8Mhz, and you only need 2.7Mhz. For select lines...I think you can use a 5x32 decoder out of 4028s.

What content will be displayed? If you go with a multi-microcontroller solution will they need to exchange data (or have a master)?

orangeLearner:

mhenstell:
Trying to position the controller(s) close to the strips. I suppose it would be feasible to use one SPI port and 24 select lines, but the SPI port would need to run at nearly 3 megabits.

That should be theoretically possible right? The SPI runs at a maximum frequency of system clock/2, so that's 8Mhz, and you only need 2.7Mhz. For select lines...I think you can use a 5x32 decoder out of 4028s.

I think the issue is more that I need to receive, store, and then send out over SPI 480 bytes, times 24 strips, 30 times a second. I don't think I can even get that amount of data through the FTDI.

Content is coming from processing. I'll probably have to go with at least three large micros, but they will not need to exchange data ("simple" USB to SPI converter with a buffer). I've looked at USB to SPI converters but can't find anything that looks even remotely easy to get started with.

I think you will need to use a microcontroller with native USB at the very least like the Teensy (and ATmega32u4 variants)/Maple

This way you can cut out the FTDI latency/bottleneck. I believe from there it should be easy to write a sketch on your microcontroller that reads in a bunch of data from USB and writes it out to the LEDs and buffers the next frame with any spare time it has.

I don't think I can even get that amount of data through the FTDI.

While the serial connection itself is a bottleneck I suspect the one-byte-at-a-time interrupt in the Arduino core will be the biggest problem.

I agree with @orangeLearner. A few Teensy boards should work.

In any case, you will want to size the outbound data to be evenly divisible into full USB frames. I believe not-full-frames are held for a few milliseconds (Nagle's algorithm meets USB).

Excellent point, I believe I remember reading somewhere that the teensy gets full USB frames of 64 bits. I'll do some more reading on that tonight.