SPI - Bitbang VS. SPI library

Hi all,

I've been playing with a new 128 x 64 pixel vacuum fluorescent display (a Noritake GU128X64E-U100) module which has several different interface modes (i.e. parallel and 3 different types of serial).

I have it working using the library code supplied by Noritake, but this code bit-bangs the ports with digitalWrite() and digitalRead() statements.

When trying to send data fast (like a live, runtime bar graph), the transfer rate is slow enough that the display doesn't move smoothly (hard to explain). I can see the updated image "sweep" across the display... imagine an old CRT TV set with a slow scan rate.

Anyway, I wonder if I use the Arduino hardware SPI pins and the SPI.h library (and modify the Noritake library to use it), could I expect to see better results (i.e. faster data transfers)?

Thanks!

-- Roger

Yes. Hardware SPI is much quicker than software SPI, if the receiving device can accept the data at the faster rate.

A digitalWrite() call takes about 15us to execute. There are two pins that need to be controlled meaning that the absolute max speed you can achieve using the wiring library is 2*15us = 30us per clock cycle, or 33kHz.

The Hardware SPI can run at a maximum of half FCPU. For an Arduino, that is 8MHz.

Based on that, using the SPI library should be around 240 times faster.

There could also be a alternate method using direct port access in the software SPI that would be somewhere between hardware SPI and software SPI using arduino digital pin commands.

Lefty

(whether the faster communications rate will actually improve the way things LOOK is a separate question.)

You could speed up the comm a great deal, without running into the limitations of hardware SPI, but replacing the digitalWrite/digitalRead calls with "port IO" statements.

Yup, you can get about 250kHz from a bitbanged software SPI using direct port writes from what I have seen. I can get 125k on an 8MHz ATTiny with the software SPI library I wrote.

I've attached the library. It is not quite finished, but it works, and has all the features of the Hardware SPI library except that you call begin() with the four arduino pin numbers for your SPI interface:
begin(SCK, MOSI, MISO, SS);

TinySoftwareSPI.h (1.97 KB)

TinySoftwareSPI.cpp (4.55 KB)

Thanks for sharing the software SPI implementation, Tom, it looks damned useful.

No, its about 4us with constant parameters on a 16MHz Uno, meaning about 15kB/s rate for bit-banged
SPI using digitalWrite (about 18 calls)

With full speed Hardware SPI on that board you get bytes sent at 8MHz, but some overhead between bytes,
so the practical rate is about 500kB/s so long as you toggle the CS pin with direct port manipulation (using digitalWrite
for that will dominate the time in SPI handling)

Bit-banging with direct port manipulation can get you close to the hardware SPI speeds.

The UART can be programmed to do SPI master mode too I believe.

It isn't that long on a 16mhz AVR. I've measured this many times and just
measured it again earlier today on a 16mhz atmega328 using a logic analyzer
and it is right at about 4.8us.

Still pretty abysmal compared to what an AVR can do with direct port i/o which is 62.5ns

--- bill

I've used a lot of SPI and I've never really used the SPI library yet. I ususally use direct port access to set my slave select pins and then I just write bytes to SPDR and watch SPIF to see when it's time to load on another. It can go really really fast like that.

I would probably use the SPI library if I hadn't already found that method from reading the datasheet and a bit of code. But that's how I do it.

A blocking example:

SPDR = byteToSend;
while(!(SPSR & (1<<SPIF)));

For a non-blocking code, you could check SPIF every time to let you know you can send another byte or I suppose you could even set up an interrupt to handle it.

I haven't tried anything like that but I'm curious to know whether there's a reason for doing the send/wait in that order. Intuitively I've have guessed it would be more efficient to do it the other way round (prepare the byte to send, wait for the hardware to become idle, send the byte, move on and do something else).

A blocking example:

I would flip the sequence:

while(!(SPSR & (1<<SPIF))) continue; //wait if the prior transmission hasn't ended

SPDR = byteToSend;

This can be considerably faster than your sequency.

I see. I guess you're right. The code I grabbed was actually waiting for a reply. So it had a line after that.

byteRecieved = SPDR;

And in that code I was waiting for the flag so I would know there would be good data in SPDR to read.

If you want to get the maximum speed out of the hardware SPI, It's even faster if you don't wait for the SPIF bit but instead include just the right number of NOPs between writing one byte and the next to SPDR.

And the right number is ? And is it constant for a given SPI Clock setting ?

Duane B

You stick a 'scope on the SCLK pin and trial and error - the exact delay might be sensitive to the nature of surrounding
code and the avr-gcc version and optimization flags used, so its not perfect technique. The ratio of SPI clock to system
clock is fixed by the prescaler divide ratio so if you get it working at 16MHz it should still work at 8MHz system clock too...

The code I grabbed was actually waiting for a reply.

That's why it is so much faster if you do it via the interrupt: you can just keep loading up the next byte to be sent in the isr, freeing up the mcu.

See http://arduino.cc/forum/index.php/topic,129824.0.html. Yes of course the number of NOPs required will depend on clock frequency; but below the maximum available SPI clock frequency (8MHz on a 16MHz Arduino), the speed gain of not using a busy wait loop will be proportionately lower, so it's probably not worthwhile.

dhenry:

The code I grabbed was actually waiting for a reply.

That's why it is so much faster if you do it via the interrupt: you can just keep loading up the next byte to be sent in the isr, freeing up the mcu.

If you run SPI at 8MHz the overhead of calling the interrupt handler routine is likely more than the SPI transfer (16 clocks)
and busy-waiting will be more efficient? Be interesting to compare at 8MHz/4MHz/2MHz etc.

The math is slightly different.

When pulling, you are waiting 16 ticks to transmit and load the next char.

In interrupt, your transmission starts as you load up the char. You then exit the isr and the mcu is doing other things. 16 ticks later, the isr fires; With isr latency (~16 ticks), you load up the next char. So every 32 ticks, you send a char. Of that 32 ticks, the mcu is working on something else for 16 ticks. The net result is that the actual transmission is slower (8Mhz spi running at 4Mhz over the long run).

However, if you lower the spi speed (to 100khz for example), the interrupt approach offers considerable advantage, in terms of efficiency and convenience: it is load-and-forget from the programmer's perspective.