controlling 32 MAX7219's in 8x4 matrix

For a clock I'm building (slowly) finally got to have a full scale test of the 32 MAX7219's.

Each MAX7219 is square, modular and has 8x8 LEDs and cost ~ a$2.50 fully assembled from EBay via China.

http://www.ebay.com.au/itm/5x-MAX7219-Serial-Dot-Matrix-8x8-Led-Display-Scheda-Matrice-Module-for-Arduino-/400926733329?hash=item5d59187411

I built a MAX7219 library from scratch ages ago but have been refining it to not only work with small cascades but also large ones.

I learnt a few things which other people may benefit from so here it goes.

  1. Keep your graphics libraries and hardware specific code separate. I have a 1 bit graphics library that works on virtual bitmaps and at the end of the day these virtual bitmaps can be transferred to just about any physical device very easily. I use exactly the same code for 7219 leds, Nokia 5100 and a 128*64 OLED display.

  2. Because of the way MAX7219s cascade each time one is added it will add overhead because of the way you need to use NOP instructions. I Found using the MEGA2650 the slowdown from 8 modules to 32 was totally unacceptable for updating a screen. The MAX7219s have a bus speed of 10 Mhz or even a bit less if you have long wires and too many cascades. With 32 I was getting the odd bit of garbage even when lowering the clock to 8 and even 4 Mhz.

I thought the project was doomed but thought about it a bit more. My solution was to add 1 extra parameter to the initialization which tells the code how many rows the MAX7219's are split into. In my case I decided 4 which means instead of having 32 7219s all joined I then had 4 lots of 8X1 and each lot of 8 has it's own CS line in. Although this option now meant I was bitbanging in software instead of hardware it was still much faster because it only needed to address 8 modules for any update instead of 32.

The only change I made was right at the end...

Just before transferring to hardware the device number 0-32 and the CS line is altered depending on the module to be updated. The code assumes all the CS lines are in sequence, I used 4 and the one I passed in is 3... so the lines 3, 4, 5 and 6 would have been initialized.

  1. Although VCC and GND can be connected from one module to the next I found because of wiring etc it was better to have a more modest number of 8 or less in the cascade. The same applies to MOSI, SCK and DIN.

  2. Updating... updating values on a max7219 is seriously slow... especially in cascades. The first way to combat this is the graphics, text, etc is all done on a virtual bitmap of custom size in memory and the entire virtual image is dumped to the device when the screen needs refreshing. This means at the most every row of 8 pixels is only updated once.

The other major time saver is the current value of each 8x8 module is held in memory and if the value to be applied is the same... it's skipped.

Hi. Are you using the lcd_control library? If so, that is the cause of the slow updating, not the max7219. There are other libraries to try, or you can send data directly to the chips directly using no library without too much trouble. You talk about sending NOP instructions, and this wastes the SPI bandwidth too and can be entirely avoided with careful coding. Post your code and we can advise.

If you are getting data corruption, you can use buffer chips on the clock and latch lines. Stick to hardware SPI, because bit-banging will certainly cause performance problems with large numbers of chips.

Even using a 4MHz SPI clock, you should be able to fully update 32 8x8 matrixes hundreds of times per second. Updating every led in the matrix means sending a mere 32x8x2 = 512 bytes which should in theory take around 1ms. Even with software overheads this should be possible in less than 10ms.

Paul

Hi Paul, thanks for the input. The library I'm using is just a small self written library.

I assume I can get out of writing the NOP instructions if I fill the data in the "correct order" via some fancy coding which may mean not bothering about the caching... as I assume writing data or a NOP to a location will take the same amount of time.

On the Mega it updates the entire 32 modules very fast at the moment, main complexity being the bit orientation between the virtual bitmap and the hardware is so different it requires formatting on-the-fly.

I will look into changing code so it requires less cycles. I guess the code still has a few hangovers from the original code. :confused:

You may be able to avoid the bit orientation overheads by organising your matrices in the same orientation as the bitmap data. For example rotating them all by 90 degrees and/or connecting them as 8 columns of 4 rather than 4 rows of 8, if the physical connectors allow it.

:o I wrote a small animated test pattern using the optimized updating and the speed is so fast I had to put a delay in it so it wasn't just a blur.

As for the rendering from a virtual screen... when I initialize I will fill a few very small arrays that will help rendering greatly.

So far it's showing great promise.

** Now works without any NOP codes and fills the entire 32 modules from the virtual bitmap.

Are you using SPI.transfer with 4 unique chip select lines, one for each row? That would be fastest, with direct port manipulation for the CS pins.
I don't why that isn't used more often. I use SPI all the time for hardware transfers. At 8 MHz SCK, it goes really fast!

CrossRoads:
Are you using SPI.transfer with 4 unique chip select lines, one for each row? That would be fastest, with direct port manipulation for the CS pins.
I don't why that isn't used more often. I use SPI all the time for hardware transfers. At 8 MHz SCK, it goes really fast!

Yes... I do all the above... 4 lines 1 for each row of 8 modules. Also use direct port manipulation where applicable. If ported to the Due... it takes care of the lines with the begin and end transaction.

Basically what was a bottleneck is ... gone which in turn makes the animations a lot smoother.