Have I made this hardware SPI transfer as fast as possible?

scswift:
Nope, adding a NOP outside the loop doesn't allow me to use only 9 NOPS in the loop.

I still think you should be able to. 17 cycles is right, according to theory.

Try adding a few more after that final WAIT, to allow for the extra instructions that aren't being done (outside the loop).

Yep. The glitch might be in the last byte with the main loop working perfectly.

scswift:
The board I'm using is one of my own design with a 1284p. The problem with updating the LEDs isn't one of the SPI port being too slow. It's that the DAC is on the same port. So I have to output all the LED data in between DAC updates. Which is impossible if I have too many LEDs attached. Hence the warbling that's been bugging me.

What are you doing with the audio when you update the LEDs? Do you skip an audio sample (or two) to compensate or just leave it running? Skipping samples might make it sound better.

scswift:
The ideal solution would be to move the DAC over to one of the two USARTs.

Or leave the DAC where it is and move the LEDs to a different SPI port.

...or move the LEDs onto two ordinary output pins and do data/clock manually.

How many boards have you made? Is it not worth adding two patch wires to move the LEDs?

fungus:

scswift:
The board I'm using is one of my own design with a 1284p. The problem with updating the LEDs isn't one of the SPI port being too slow. It's that the DAC is on the same port. So I have to output all the LED data in between DAC updates. Which is impossible if I have too many LEDs attached. Hence the warbling that's been bugging me.

What are you doing with the audio when you update the LEDs? Do you skip an audio sample (or two) to compensate or just leave it running? Skipping samples might make it sound better.

I'm using a modified version of the WaveHC lib. I didn't really think about what it was doing if the interrupt didn't trigger when it was supposed to, but I don't think it's designed to skip samples if that happens. I'm also not sure exactly what it does when interrupts are re-enabled. It could play the pending sample immediately, and then wait 45us to play the next. Or it could play it immediately, then play the next anywhere from 0us to 45us later. I just don't know.

This is how it sets up the interrupt:

  // Set up timer one
  // Normal operation - no pwm not connected to pins
  TCCR1A = 0;
  // no prescaling, CTC mode
  TCCR1B = _BV(WGM12) | _BV(CS10); 
  // Sample rate - play stereo interleaved
  OCR1A =  F_CPU / (dwSamplesPerSec*Channels); // 16mhz / (22050 hz * 1 channel) = 725 clocks between interrupts
  // SD fill interrupt happens at TCNT1 == 1
  OCR1B = 1;
  // Enable timer interrupt for DAC ISR
  TIMSK1 |= _BV(OCIE1A);

Anyway, I guess skipping the sample entirely might improve the audio quality. Not sure how to best implement that. Depends on what the timer interrupt is doing after interrupts are re-enabled I guess.

I could stop disabling interrupts and set a flag telling it not to send the sample to the dac and have the dac interrupt look for that. But not disabling the interrupts could make the LED update take even longer and miss more samples.

Or, I could try to figure out how many samples I'll miss and update the sample pointer, but that would make things complicated since the code for the double buffering doesn't expect the play position to be incremented by multiple samples and I don't know how it would behave if I did that.

Neither seems like a good option.

How many boards have you made? Is it not worth adding two patch wires to move the LEDs?

I've had 10 boards made so far, with an order for over 150 paid for and pending, but I don't have the money or time to change the PCB design, and get new PCBs and a new stencil made. This project has already run two months longer than I budgeted and it will be another couple weeks before I actually start shipping boards.

As for patch wires, my customers will be handling the boards to install them in replica props. Adding wires would look sloppy and they might end up breaking them off.

The issue with the sound isn't that big of a deal anyway. Almost all the people that pre-ordered these only need two LED modules, and I can run three using the latest code without any noticeable glitches. As for the remainder, they could use one board for sound and one for LEDs if they need more than 96 LEDs. The main reason I want to fix the issue is that these are supposed to be programmable prop controller boards, and the wider market may want to be able to drive more LEDs for more complex props. I'll just have to fix the issue down the road once I do a second run. :confused:

I tried 9 NOPS and a full two waits after and got no LEDs lighting.
I tried 8 NOPS and a full two waits, and the LEDs on the first LED module seemed to light properly but the second just displayed random garbage.

Well, I thought I found a way to use 9 NOPs, but on second look using LEDs set to 50% brightness instead of full brightness, I just found bits were being skipped.

To make matters worse, now that I've switched to using name spaces to store my LED data, suddenly I need to use 11 NOPs in my loop:

/*
This function shifts all the LED data out to the TLC5947's. 
*/

#define NOP __asm__ __volatile__ ("nop\n");
#define WAIT NOP NOP NOP NOP NOP NOP NOP NOP NOP NOP NOP // 11 NOPs 

void updateLEDs() {
 
  const int LEDFrameTime = 1000 / 30; // The number of milliseconds between each update of the LEDs.  1000 / 24 = 24fps.
  
  static unsigned long nextLEDFrameTime = 0;   
  
  //uint8_t *thisLED;
  //uint8_t *lastLED;

  byte *thisLED;
  byte *lastLED;

  if (time >= nextLEDFrameTime) { // If it is time for an LED update...  
  
    //unsigned int time = micros();
      
    thisLED = &led::data[led::modules*36]; // The first operation is to decrement this pointer, so the first time we write thisLED it will be array index LEDMODULES*36-1
    lastLED = &led::data[0]; 
    
    cli(); // Halt interrupts.

    do {       
      SPDR = *--thisLED; WAIT;        
    } while (thisLED != lastLED); // thisLED is decremented one last time before we hit the end of the loop, so after byte 0 transfers, the loop exits.

    WAIT; // Wait for last byte to finish transfer. 
    SPSR = SPSR & ~_BV(SPIF); // Clear transfer flag.
    
    // Move data from shift register into latch register:
    
      LATPORT &= ~(1 << LATPIN); // Set LATCH pin low.  It is important to do this first, just before transitioning high!  I don't know why, but the LED modules won't work right otherwise!
      LATPORT |= (1 << LATPIN); // Set LATCH pin high.  Transitioning from low to high moves all data from the shift register into the latch registers. 

      ENAPORT |= (1 << ENAPIN); // Set BLANK pin high. Turn all outputs off, and reset grayscale timer.  Blanking reduces flicker, but it does not seem to matter if it is before, after, or in the middle of latching.
      ENAPORT &= ~(1 << ENAPIN); // Set BLANK pin low. Turn all outputs on.
      
    sei(); // Reenable interrupts.
    
    //time = micros() - time; 
    //Serial1.println(time,DEC); 
    //Serial1.println(thisLED,DEC);
    
    nextLEDFrameTime = time + LEDFrameTime; // Set the time of the next update.
    
  }
}

It occurs to me though that the number of NOPs don't matter so much as the execution speed, so I will benchmark that now.

Well this is odd...

I finally figured out how to dump the assembly code myself, and here is the disassembly of my updateLEDs function:

00000624 <_Z10updateLEDsv>:
     624:	20 91 c2 01 	lds	r18, 0x01C2
     628:	30 91 c3 01 	lds	r19, 0x01C3
     62c:	40 91 c4 01 	lds	r20, 0x01C4
     630:	50 91 c5 01 	lds	r21, 0x01C5
     634:	80 91 a5 02 	lds	r24, 0x02A5
     638:	90 91 a6 02 	lds	r25, 0x02A6
     63c:	a0 91 a7 02 	lds	r26, 0x02A7
     640:	b0 91 a8 02 	lds	r27, 0x02A8
     644:	28 17       	cp	r18, r24
     646:	39 07       	cpc	r19, r25
     648:	4a 07       	cpc	r20, r26
     64a:	5b 07       	cpc	r21, r27
     64c:	08 f4       	brcc	.+2      	; 0x650 <_Z10updateLEDsv+0x2c>
     64e:	31 c0       	rjmp	.+98     	; 0x6b2 <_Z10updateLEDsv+0x8e>
     650:	f8 94       	cli
     652:	ec e9       	ldi	r30, 0x9C	; 156
     654:	f2 e0       	ldi	r31, 0x02	; 2
     656:	82 91       	ld	r24, -Z
     658:	8e bd       	out	0x2e, r24	; 46
     65a:	00 00       	nop
     65c:	00 00       	nop
     65e:	00 00       	nop
     660:	00 00       	nop
     662:	00 00       	nop
     664:	00 00       	nop
     666:	00 00       	nop
     668:	00 00       	nop
     66a:	00 00       	nop
     66c:	00 00       	nop
     66e:	82 e0       	ldi	r24, 0x02	; 2
     670:	e4 35       	cpi	r30, 0x54	; 84
     672:	f8 07       	cpc	r31, r24
     674:	81 f7       	brne	.-32     	; 0x656 <_Z10updateLEDsv+0x32>
     676:	00 00       	nop
     678:	00 00       	nop
     67a:	00 00       	nop
     67c:	00 00       	nop
     67e:	00 00       	nop
     680:	00 00       	nop
     682:	00 00       	nop
     684:	00 00       	nop
     686:	00 00       	nop
     688:	00 00       	nop
     68a:	8d b5       	in	r24, 0x2d	; 45
     68c:	8f 77       	andi	r24, 0x7F	; 127
     68e:	8d bd       	out	0x2d, r24	; 45
     690:	2a 98       	cbi	0x05, 2	; 5
     692:	2a 9a       	sbi	0x05, 2	; 5
     694:	2b 9a       	sbi	0x05, 3	; 5
     696:	2b 98       	cbi	0x05, 3	; 5
     698:	78 94       	sei
     69a:	2f 5d       	subi	r18, 0xDF	; 223
     69c:	3f 4f       	sbci	r19, 0xFF	; 255
     69e:	4f 4f       	sbci	r20, 0xFF	; 255
     6a0:	5f 4f       	sbci	r21, 0xFF	; 255
     6a2:	20 93 a5 02 	sts	0x02A5, r18
     6a6:	30 93 a6 02 	sts	0x02A6, r19
     6aa:	40 93 a7 02 	sts	0x02A7, r20
     6ae:	50 93 a8 02 	sts	0x02A8, r21
     6b2:	08 95       	ret

Strange, that this shows only 10 NOPs, when there should be 11!

Could it be the whole time I thought I had 10 before, there were actually 9?

Have I done something wrong here? This is how it was written in some sample code I found...

#define NOP asm volatile ("nop\n");

In the code in the zip file you uploaded:

#define NOP __asm__ __volatile__ ("nop\n");
#define WAIT NOP NOP NOP NOP NOP NOP NOP NOP NOP NOP  // 11 NOPs

10 NOPs. Bad comment.

Yes, that may be wrong there, but in the version I am currently compiling it's 11.

Also I am now pulling my hair out over this:

#define WAIT asm volatile ("nop\n\t"
"nop\n\t");

I get the error "unexpected unqulified id before stirng constant".

I have looked at tons of examples of asm used in arduino code and it looks like that should work. I have tried it without the last \n\t. and I have tried it with :: added at the end.
It works with just one nop:

#define WAIT asm volatile ("nop\n\t");

But I can't put more than one with each on a new line!

I'll add the machine cycles for you:

00000624 <_Z10updateLEDsv>:
     624:       20 91 c2 01     lds     r18, 0x01C2   (2)
     628:       30 91 c3 01     lds     r19, 0x01C3   (2)
     62c:       40 91 c4 01     lds     r20, 0x01C4   (2)
     630:       50 91 c5 01     lds     r21, 0x01C5   (2)
     634:       80 91 a5 02     lds     r24, 0x02A5   (2)
     638:       90 91 a6 02     lds     r25, 0x02A6   (2)
     63c:       a0 91 a7 02     lds     r26, 0x02A7   (2)
     640:       b0 91 a8 02     lds     r27, 0x02A8   (2)
     644:       28 17           cp      r18, r24   (1)
     646:       39 07           cpc     r19, r25   (1)
     648:       4a 07           cpc     r20, r26   (1)
     64a:       5b 07           cpc     r21, r27   (1)
     64c:       08 f4           brcc    .+2             ; 0x650 <_Z10updateLEDsv+0x2c>   (1/2)
     64e:       31 c0           rjmp    .+98            ; 0x6b2 <_Z10updateLEDsv+0x8e>   (2)
     650:       f8 94           cli   (1)
     652:       ec e9           ldi     r30, 0x9C       ; 156   (1)
     654:       f2 e0           ldi     r31, 0x02       ; 2   (1)
     656:       82 91           ld      r24, -Z   (2)
     658:       8e bd           out     0x2e, r24       ; 46   (1)
     65a:       00 00           nop   (1)
     65c:       00 00           nop   (1)
     65e:       00 00           nop   (1)
     660:       00 00           nop   (1)
     662:       00 00           nop   (1)
     664:       00 00           nop   (1)
     666:       00 00           nop   (1)
     668:       00 00           nop   (1)
     66a:       00 00           nop   (1)
     66c:       00 00           nop   (1)
     66e:       82 e0           ldi     r24, 0x02       ; 2   (1)
     670:       e4 35           cpi     r30, 0x54       ; 84   (1)
     672:       f8 07           cpc     r31, r24   (1)
     674:       81 f7           brne    .-32            ; 0x656 <_Z10updateLEDsv+0x32>   (1/2)
     676:       00 00           nop   (1)
     678:       00 00           nop   (1)
     67a:       00 00           nop   (1)
     67c:       00 00           nop   (1)
     67e:       00 00           nop   (1)
     680:       00 00           nop   (1)
     682:       00 00           nop   (1)
     684:       00 00           nop   (1)
     686:       00 00           nop   (1)
     688:       00 00           nop   (1)
     68a:       8d b5           in      r24, 0x2d       ; 45   (1)
     68c:       8f 77           andi    r24, 0x7F       ; 127   (1)
     68e:       8d bd           out     0x2d, r24       ; 45   (1)
     690:       2a 98           cbi     0x05, 2 ; 5   (2)
     692:       2a 9a           sbi     0x05, 2 ; 5   (2)
     694:       2b 9a           sbi     0x05, 3 ; 5   (2)
     696:       2b 98           cbi     0x05, 3 ; 5   (2)
     698:       78 94           sei   (1)
     69a:       2f 5d           subi    r18, 0xDF       ; 223   (1)
     69c:       3f 4f           sbci    r19, 0xFF       ; 255   (1)
     69e:       4f 4f           sbci    r20, 0xFF       ; 255   (1)
     6a0:       5f 4f           sbci    r21, 0xFF       ; 255   (1)
     6a2:       20 93 a5 02     sts     0x02A5, r18   (2)
     6a6:       30 93 a6 02     sts     0x02A6, r19   (2)
     6aa:       40 93 a7 02     sts     0x02A7, r20   (2)
     6ae:       50 93 a8 02     sts     0x02A8, r21   (2)
     6b2:       08 95           ret   (4)

You can disassemble the .ELF file you know, that gives you the source as well:

avr-objdump -S -z /pathname/whatever.cpp.elf

scswift:
Also I am now pulling my hair out over this:
...
But I can't put more than one with each on a new line!

Your earlier method was OK. Slow down and check that the source matches the object.

For a define, if you want multiple lines all but the last have to end with a backslash.

For a define, if you want multiple lines all but the last have to end with a backslash.

Did I ever mention that I gave up programming completely for ten years shortly after I first started using C?

Your earlier method was OK. Slow down and check that the source matches the object.

Of course it does, do you take me for an...

...

edits batch file

AHEM. Moving on!

Well this the the asm code being generated now. Looks a bit odd to me but it has the right number of nops:

    cli(); // Halt interrupts.
     844:	f8 94       	cli

    do {       
      SPDR = *--thisLED; WAIT;        
     846:	82 91       	ld	r24, -Z
     848:	8e bd       	out	0x2e, r24	; 46
     84a:	00 00       	nop
     84c:	00 00       	nop
     84e:	00 00       	nop
     850:	00 00       	nop
     852:	00 00       	nop
     854:	00 00       	nop
     856:	00 00       	nop
     858:	00 00       	nop
     85a:	00 00       	nop
     85c:	00 00       	nop
     85e:	00 00       	nop
    thisLED = &led::data[led::modules*36]; // The first operation is to decrement this pointer, so the first time we write thisLED it will be array index LEDMODULES*36-1
    lastLED = &led::data[0]; 
    
    cli(); // Halt interrupts.

    do {       
     860:	e4 17       	cp	r30, r20
     862:	f5 07       	cpc	r31, r21
     864:	81 f7       	brne	.-32     	; 0x846 <_Z10updateLEDsv+0x6c>
      SPDR = *--thisLED; WAIT;        
    } while (thisLED != lastLED); // thisLED is decremented one last time before we hit the end of the loop, so after byte 0 transfers, the loop exits.

    WAIT; // Wait for last byte to finish transfer. 
     866:	00 00       	nop
     868:	00 00       	nop
     86a:	00 00       	nop
     86c:	00 00       	nop
     86e:	00 00       	nop
     870:	00 00       	nop
     872:	00 00       	nop
     874:	00 00       	nop
     876:	00 00       	nop
     878:	00 00       	nop
     87a:	00 00       	nop
    SPSR = SPSR & ~_BV(SPIF); // Clear transfer flag.
     87c:	8d b5       	in	r24, 0x2d	; 45
     87e:	8f 77       	andi	r24, 0x7F	; 127
     880:	8d bd       	out	0x2d, r24	; 45
    
    // Move data from shift register into latch register:
    
      LATPORT &= ~(1 << LATPIN); // Set LATCH pin low.  It is important to do this first, just before transitioning high!  I don't know why, but the LED modules won't work right otherwise!
     882:	2a 98       	cbi	0x05, 2	; 5
      LATPORT |= (1 << LATPIN); // Set LATCH pin high.  Transitioning from low to high moves all data from the shift register into the latch registers. 
     884:	2a 9a       	sbi	0x05, 2	; 5

      ENAPORT |= (1 << ENAPIN); // Set BLANK pin high. Turn all outputs off, and reset grayscale timer.  Blanking reduces flicker, but it does not seem to matter if it is before, after, or in the middle of latching.
     886:	2b 9a       	sbi	0x05, 3	; 5
      ENAPORT &= ~(1 << ENAPIN); // Set BLANK pin low. Turn all outputs on.
     888:	2b 98       	cbi	0x05, 3	; 5
      
    sei(); // Reenable interrupts.
     88a:	78 94       	sei

"#define WAIT NOP NOP NOP NOP NOP NOP NOP NOP NOP NOP NOP // 11 NOPs"

That is a very dangerous way to write macros.

dhenry:
"#define WAIT NOP NOP NOP NOP NOP NOP NOP NOP NOP NOP NOP // 11 NOPs"

That is a very dangerous way to write macros.

Why?

Apparently I missed Nick's post where he disassembled it and added the cycle count. :confused:

So I did it manually myself:

    do {       
      SPDR = *--thisLED; WAIT;        
     846:	82 91       	ld	r24, -Z								2
     848:	8e bd       	out	0x2e, r24	; 46							1
     84a:	00 00       	nop										1
     84c:	00 00       	nop										1
     84e:	00 00       	nop										1
     850:	00 00       	nop										1
     852:	00 00       	nop										1
     854:	00 00       	nop										1
     856:	00 00       	nop										1
     858:	00 00       	nop										1
     85a:	00 00       	nop										1
     85c:	00 00       	nop										1
     85e:	00 00       	nop										1
     860:	e4 17       	cp	r30, r20								1
     862:	f5 07       	cpc	r31, r21								1
     864:	81 f7       	brne	.-32     	; 0x846 <_Z10updateLEDsv+0x6c> 		1/2
      SPDR = *--thisLED; WAIT;        
    } while (thisLED != lastLED); // thisLED is decremented one last time before we hit the end of the loop, so after byte 0 transfers, the loop exits.

    WAIT; // Wait for last byte to finish transfer. 
     866:	00 00       	nop										1
     868:	00 00       	nop										1
     86a:	00 00       	nop										1
     86c:	00 00       	nop										1
     86e:	00 00       	nop										1
     870:	00 00       	nop										1
     872:	00 00       	nop										1
     874:	00 00       	nop										1
     876:	00 00       	nop										1
     878:	00 00       	nop										1
     87a:	00 00       	nop										1
    SPSR = SPSR & ~_BV(SPIF); // Clear transfer flag.
     87c:	8d b5       	in	r24, 0x2d	; 45							1
     87e:	8f 77       	andi	r24, 0x7F	; 127							1
     880:	8d bd       	out	0x2d, r24	; 45							1

Obviously I can get rid of some nop's after the loop there but I gotta go back over the posts and see what was said should be the theoretical minimum for the SPI transfer cycles.

It would be at least 16 because SPI is running at half the clock speed and you have to clock out 8 bits. Someone above suggested the 17th cycle to handle the final clock transition before starting again.

Okay so apparently the theoretical minimum folks think is 17 cycles. And I've got 18 there in the loop, and 17 doesn't work.

In fact 17 doesn't just not work. 17 results in no LEDs illuminating at all. If the data were being corrupted I should see LEDs illuminating. Since I don't, and I am not sending 0's, it seems that somehow writing to the SPI register a moment before the data has finished sending aborts the transfer. Because if I was even getting the occasional stray 1 in there, I would see the leds at various brightness levels randomly lighting up.

Very strange.

OK, well running this exact code:

#include <SPI.h>

#define NOP __asm__ __volatile__ ("nop\n");
#define WAIT NOP NOP NOP NOP NOP NOP NOP NOP NOP NOP NOP // 11 NOPs 

namespace myLeds {
   
    const int LEDBYTES = 36;
    const int SCALEBYTES = 24;
    
    int modules;   // Number of LED modules connected.  Defined in config.txt.
  
    byte * data;    // LED data is stored as a string of 12 bit LED values.  There are 3 bytes for every two LEDs.  We store it this way because the data does not need to be altered when we are shifting it out to the TLC5947s.  Modifying the data with bit shifts would be very expensive and there is a lot of data to move.
    byte * scale;   // Scaling factor for LED brightness.  Used when say, you want to be able to adjust the brightness of the powercell without changing all the code that assumes the max desired brightness is 1.0.  Value stored here is scaling_factor*255, where scaling factor is 0..1.
      
}  // end of namespace myLeds

void setup ()
  {
  myLeds::modules = 2;

  myLeds::data =  (byte *) malloc (myLeds::modules * myLeds::LEDBYTES * sizeof(byte));  
  myLeds::scale = (byte *) malloc (myLeds::modules * myLeds::SCALEBYTES * sizeof(byte));
  
  SPI.begin ();
  digitalWrite (SS, HIGH);
  SPI.setClockDivider (SPI_CLOCK_DIV2);
  Serial.begin (115200);
  
  byte *thisLED = &myLeds::data [myLeds::modules * myLeds::LEDBYTES]; 
  byte *lastLED = &myLeds::data [0]; 
  
  // populate the array
  byte val = 0;
   do 
     {       
      *--thisLED = val++;
      } while (thisLED != lastLED); 
  
  }  // end of setup
  
void loop ()
  {
      
  byte *thisLED = &myLeds::data [myLeds::modules * myLeds::LEDBYTES]; 
  byte *lastLED = &myLeds::data [0]; 
     
  digitalWrite (SS, LOW);
  
  noInterrupts ();
  unsigned long startTime = micros();
  
   do 
     {       
     SPDR = *--thisLED; WAIT;
      } while (thisLED != lastLED); 
  
  WAIT; // Wait for last byte to finish transfer. 
  SPSR = SPSR & ~_BV(SPIF); // Clear transfer flag.
  unsigned long endTime = micros();
  digitalWrite (SS, HIGH);

  interrupts ();
  Serial.println (endTime - startTime);
  delay (1000);
  }  // end of loop

And checking with the logic analyzer:

Notice the 125 nS gap between bytes? (The 17th and 18th clock cycle).

That required the 11 NOPs (added to the other stuff in the loop).

If I remove a NOP it doesn't work. So you need a cycle of 18. Put it another way, the time between the leading clock edges of each byte was 1.1250 uS (18 cycles).

The output in the serial monitor was "84" which sounds about right (36 * 2 * 1.1250 = 81, plus a bit of overhead for that last byte etc.)

That's neat. I wish I had a logic analyzer. Or even an oscilliscope. :slight_smile:

So what happens when you don't have 11 nops? There's no gap there? Or is the gap smaller?