speed test

Why can’t I make this go faster?

And what’s worse, what is going on with WinVista system that I can’t paste into the forum anymore?

What I want to do is have a bunch of “rows” of 41 bytes sent out as fast as I can.

But it seems anything I try to speed up the SPI.tranfers just slows it down.
I’m monitoring the SS line to watch the performance.

This line here in setup

SPI.setClockDivider(4); <<
SPI.begin();

seems to give the best results, with a transfer burst occurring every 100uS.
If I don’t have it, the default SPI speed is slower, and if I change it to any other number either things don’t work (as in, no transfers)
2,3,6
or runs slower
default, 8,16 - takes ~142uS.

Looks to me like there’s 12uS of slack time I could take advantage of, yet I can’t make it run faster.
Any ideas?

simpletest.ino (1.3 KB)

Unlike you i'm no hardware expert, but what happens if you swapped out the crystal to a 20mhz one?

How much more speed would something like that give you?

CrossRoads:
This line here in setup

SPI.setClockDivider(4); <<
SPI.begin();

You aren’t supposed to use “4”.

See the defines:

#define SPI_CLOCK_DIV4 0x00
#define SPI_CLOCK_DIV16 0x01
#define SPI_CLOCK_DIV64 0x02
#define SPI_CLOCK_DIV128 0x03
#define SPI_CLOCK_DIV2 0x04
#define SPI_CLOCK_DIV8 0x05
#define SPI_CLOCK_DIV32 0x06

When you say “4” you are actually getting SPI_CLOCK_DIV2 which is the fastest the hardware will go. I measured the clock rate at 8 MHz which is the fastest you can get out of it on a 16 MHz chip.

Your line is equivalent to:

  SPI.setClockDivider(SPI_CLOCK_DIV2 );

CrossRoads: And what's worse, what is going on with WinVista system that I can't paste into the forum anymore?

I've been having the same problem on-and-off for about six months. Seems to go away with one update then return with the next update. Drives me insane.

CrossRoads: Looks to me like there's 12uS of slack time I could take advantage of, yet I can't make it run faster. Any ideas?

Need to see some code for that one.

@cjdelphi, I want to stick with 16MHz so the IDE & files can remain ‘normal’.

SPI.setClockDivider(4);
vs
SPI.setClockDivider(SPI_CLOCK_DIV2 );

How is one supposed to tell that from the reference section! Geesh, I am glad the default setting works for about everything I do.
So it looks I can’t do any better then.
Thanks Nick.

@Coding Badly,
Code was attached with initial post. I can’t seem to copy code into here anymore, all I get is 1 line, sometimes 2, most times none. Am sure some WinVista setting is off and preventing me.

Your theoretical fastest speed would be:

 41 * 62.5 * 17 = 43.5625 uS

That’s because it takes 17 clock cycles (it seems the hardware needs one more cycle between each byte).

You are getting 99.750 uS which is about twice that, mainly because of the loop overhead.

However you should be timing from when SS goes low to high again (the rest is overhead from all the other stuff) in which case the code now takes 88.000 uS.

Now if you unroll the loop you can speed it up a bit:

    byte * p = testArray;

    SPI.transfer(*p++);
    SPI.transfer(*p++);
    SPI.transfer(*p++);
    SPI.transfer(*p++);
    SPI.transfer(*p++);
    SPI.transfer(*p++);
    SPI.transfer(*p++);
    SPI.transfer(*p++);
    SPI.transfer(*p++);
    SPI.transfer(*p++);

    SPI.transfer(*p++);
    SPI.transfer(*p++);
    SPI.transfer(*p++);
    SPI.transfer(*p++);
    SPI.transfer(*p++);
    SPI.transfer(*p++);
    SPI.transfer(*p++);
    SPI.transfer(*p++);
    SPI.transfer(*p++);
    SPI.transfer(*p++);

    SPI.transfer(*p++);
    SPI.transfer(*p++);
    SPI.transfer(*p++);
    SPI.transfer(*p++);
    SPI.transfer(*p++);
    SPI.transfer(*p++);
    SPI.transfer(*p++);
    SPI.transfer(*p++);
    SPI.transfer(*p++);
    SPI.transfer(*p++);

    SPI.transfer(*p++);
    SPI.transfer(*p++);
    SPI.transfer(*p++);
    SPI.transfer(*p++);
    SPI.transfer(*p++);
    SPI.transfer(*p++);
    SPI.transfer(*p++);
    SPI.transfer(*p++);
    SPI.transfer(*p++);
    SPI.transfer(*p++);

    SPI.transfer(*p++);

I measured that as taking 69.0625 uS which isn’t too bad.

To shave a bit more off you can eliminate the function call like this:

#define nop asm volatile ("nop")

...

    // fake it for timing, with fakestartPoint at 0
    PORTB = PORTB & B11111011;  //digitalWrite(SSpin, LOW);
    // monitor D10, confirm 100uS timing
    SPDR = testArray[0];
    
    for (x=1; x<41; x=x+1){
      nop; nop; nop; nop; nop; nop; nop;
      SPDR = testArray[x];
    }

    nop; nop; nop; nop; nop; nop; nop; nop; nop; nop;
    PORTB = PORTB | B00000100;  //digitalWrite(SSpin, HIGH);

I measured that as taking 46.3333 uS.

CrossRoads: Am sure some WinVista setting is off and preventing me.

Ah, Windows! :P

I've had complaints about my MUD client because people can't "copy and paste" any more. It seems like Windows is tightening up security. Don't want you copying and pasting text, eh?

Stinks about pasting not working, I need a fix for that. Will try other things out. 10 KHz rate might be fast enough - I think I only need 2 KHz. Wouldn't mind the margin tho.

Ok, looks like 41 discrete SPI.transfers saves 12uS, down to 88uS between the start of 41 bute bursts. 11.25 KHz.

Ok, if I combine asm and deleting the for:next loop, I’m seeing 26uS for the burst of 41 and 38 burst to burst.
Does that seem right?
I’m not getting optimized into a false rate because I’m not sending real data or something?

simpletestR2.ino (4.52 KB)

CrossRoads: Does that seem right?

No. If you are doing it under 43.5625 uS you are not sending real data. It has to take 17 clock cycles per byte. If you look at the data being sent it will be rubbish.

You don't need to use asm because the C loop does it just as well. You can always discard a NOP or two.

I am pulling the data from a real array. I agree, 1/16,000,000 * 41 & 17 = 43.6uS What is happening to make it go faster? Do you think more NOP's per line? If I remove some NOPs won't it go even faster?

I tried to look at the SCK line - this scope doesn't cut it, will have to pull out the Sallea analyzer tomorrow.

SPDR = (testArray[fakestartPoint + 0]);nop; nop; nop; nop; nop; nop; nop; SPDR = (testArray[fakestartPoint + 1]);nop; nop; nop; nop; nop; nop; nop; : : SPDR = (testArray[fakestartPoint + 39]);nop; nop; nop; nop; nop; nop; nop; SPDR = (testArray[fakestartPoint + 40]);nop; nop; nop; nop; nop; nop; nop;

CrossRoads: Code was attached with initial post.

Sorry about that. I didn't notice there are two attachments. Pretty pictures are just too distracting for me.

I can't seem to copy code into here anymore, all I get is 1 line, sometimes 2, most times none. Am sure some WinVista setting is off and preventing me.

Are you using Internet Explorer?

Yes, IE9

Ack, 3AM! I'm off to bed. Will dig out my Sallea and look at the clocks & data tomorrow afternoon. G'night all!

CrossRoads: If I remove some NOPs won't it go even faster?

Yes, but the hardware won't have finished clocking out the byte. So you will lose data.

CrossRoads: Yes, IE9

That thing causes me fits as well. Turning "Compatibility Mode" on seems to help. The odd thing is I only have trouble with Internet Explorer on this forum. I suggest using Firefox.

G'night all!

Back at ya! I'm headed there as well...

Okay, I got some results that look good.
Used the unlooped asm version of SPI with more NOPs added, see attached. Took 15 to get reliable results. Started with 7, added 7, added 7, then started backing down. Can’t see itin the double digits, but the data does go from 0 to 40 as expected. 46uS for a burst, and 58uS burst to burst. Gives me plenty of time to set up the data in the input registers and wait for an interrupt to occur to pop it to the output registers
Thanks for the help Nick.

simpletestR2.ino (6.13 KB)

This was my first time using my Saleae 8-channel analyzer, was easy to set up the software and run it. http://www.saleae.com/downloads