Faster than shiftOut

Hello Arduino friends!

I am looking for the fastest possible way to write a byte through a digital out.
Something similar to the shiftOut() function but faster.

On the shiftOut reference page it says there is a hardware implementation in the SPI library but I am not sure if I can use that.
Is there a way to use the SPI registers independently of any other SPI specific stuff?
Would this even be the best way?

Thanks a lot for your help!!
Tom

What do you mean 'independently of other SPI stuff'? why not just write your byte out over SPI?

Ideally I wouldn't want to waste resources (pins, instructions) on things I do not need like slave select, addressing, ACKs etc.

I am just looking for the fastest way to shoot out the bit sequence of a byte through a digital out.

Yes, SPI is the way to go. It pretty much is a simple serial clocked output method. You can change the SPI clock speed to try various speeds to get the performance you need.

TomS:
Ideally I wouldn't want to waste resources (pins, instructions) on things I do not need like slave select, addressing, ACKs etc.

I am just looking for the fastest way to shoot out the bit sequence of a byte through a digital out.

Well SPI does tie up four digital pins. Other then that your left with manually shifting out a byte through a digital output pin in code, using direct port access if max speed is required.

Lefty

Does it need to be serial? Why not use four pins to send a nibble at a time over a parallel bus? That should be about 4x faster, i think.

Otherwise, use SPI, you can get up to 8mhz with SPI.setClockDivider(SPI_CLOCK_DIV2)

retrolefty:
Other then that your left with manually shifting out a byte through a digital output pin in code, using direct port access if max speed is required.

I just gave that a try and although it works it didn't gave me the speed I was hoping for.

unsigned long t0, t1;

byte data = 123;

void setup()
{
  Serial.begin(9600);
}

void loop()
{
  t0 = micros();
  
  for(int i = 0; i < 1000; i++)
  {
  for(byte mask = 00000001; mask > 0; mask <<= 1)
  {
    if(data & mask)
    {
      PORTD = (PORTD & B00111111) | B11000000; // clock high, data high
    }
    else
    {
      PORTD = (PORTD & B00111111) | B01000000; // clock high, data low
    }
    
    PORTD = (PORTD & B10111111) | B00000000; // clock low
  }
  }
  
  t1 = micros();
  
  Serial.print("Time elapsed: ");
  Serial.println(t1 - t0);
  Serial.print("Average per byte: ");
  Serial.println((t1 - t0) / 1000);
  Serial.println();
  
  delay(2000);
}

This should output a byte on pin7, using pin6 as clock while leaving the other PORTD pins untouched.

But it only gives me about 100Kb/s.

Any ideas on how to make it faster?

I just tried the SPI library, and it really is a lot faster.
So I guess I will try to adapt my design to use SPI instead.

retrolefty:

TomS:
Ideally I wouldn't want to waste resources (pins, instructions) on things I do not need like slave select, addressing, ACKs etc.

I am just looking for the fastest way to shoot out the bit sequence of a byte through a digital out.

Well SPI does tie up four digital pins. Other then that your left with manually shifting out a byte through a digital output pin in code, using direct port access if max speed is required.

SPI only ties up three pins. The SS pin does not have to be used, merely configured as an output. Presumably you need an output pin somewhere, so you can make the SS pin do that as well.

Thus, there is no slave select or addressing required. There are no ACKs in SPI.

I've taken some timings and screen shots, see attachments.

  • Your clocking out technique took 9 uS per byte.
  • I tried the shiftOut function - that took 115 uS per byte
  • Then SPI took 3 uS per byte

The SPI code is shorter anyway, see below:

#include <SPI.h>

unsigned long t0, t1;

byte data = 123;

void setup()
{
  Serial.begin(115200);
  SPI.begin ();
}

void loop()
{
  t0 = micros();

  for(int i = 0; i < 1000; i++)
  {
    SPI.transfer (data);
  }

  t1 = micros();

  Serial.print("Time elapsed: ");
  Serial.println(t1 - t0);
  Serial.print("Average per byte: ");
  Serial.println((t1 - t0) / 1000);
  Serial.println();

  delay(2000);
}

In this case clock is pin 13 and data (out) is pin 11. Pin 12 is data in but you can just ignore that. Pin 10 is SS but as long as you leave it configured as an output you can use it for any other output purpose.

By the way, your clocking out won't be totally reliable, depending on how the receiver works. You will notice from the screenshot that the clock is taken high at the exact moment the data is altered, thus the receiver needs to be careful with its timings. Both shiftOut and SPI set up the data first, and then raise the clock line, to provide a clear moment for the data to settle before clocking.

Arduino_forum_59369_shift_out.png

Arduino_forum_59369_SPI.png

Hello Nick!

Thanks a lot for giving us some visual comparison of the methods. (Btw, what program / device did you use here?)

We can even speed up SPI a little more by setting the clockDivider

SPI.setClockDivider(divider);

Since I need a multi-master design I had ruled out SPI at first, but now I think I might be able to use the SS line to synchronize the masters in order to avoid data collision.
We'll see how that goes.

Have a great day!
Tom

TomS:
Thanks a lot for giving us some visual comparison of the methods. (Btw, what program / device did you use here?)

Screenshots of my Saleae Logic analyzer.