What am doing wrong here with my class?

Something like this:

namespace myLeds {
   
    int modules;   // Number of LED modules connected.  Defined in config.txt.
  
    byte * data;    
    byte * scale;   
 
    void set(const int which, const float a, const float b)
      {
      // whatever
      }
      
    float get(const int which)
      {
      return 42.0;
      }
    
};  // end of namespace myLeds

void setup ()
  {
  myLeds::modules = 2;

  myLeds::data =  (byte *) malloc (myLeds::modules * 36 * sizeof (byte));  
  myLeds::scale = (byte *) malloc (myLeds::modules * 24 * sizeof (byte));
  }  // end of setup
  
void loop ()
  {
  myLeds::set (1, 2, 3);
  }  // end of loop

This has reduced namespace clutter because the only "global" name now is myLeds, and the other functions and data need to be qualified to get at them.

That's all it has done, but it does mean that if you happen to use "data" or "scale" elsewhere (not qualified) then they don't get confused with the ones in the namespace.

Strictly speaking I would have MODULES as modules, because all caps is for constants (another reason not to use the name LED).

A thing, like modules, that you are planning to change, is clearly not a constant. Unless perhaps it is just something you change at compile time.

It was a constant at one point, and it still is, but eventually it will be loaded from a file, so yeah I will change that.

namespaces don't have semicolons at the end; might want to fix that (although it still compiles)

This compiles. It appears to work:

#include <SPI.h>

namespace myLeds 
    {
   
    const int LEDBYTES = 36;
    const int SCALEBYTES = 24;
    
    int modules;   // Number of LED modules connected.  Defined in config.txt.
  
    byte * data;    // LED data is stored as a string of 12 bit LED values.  There are 3 bytes for every two LEDs.  We store it this way because the data does not need to be altered when we are shifting it out to the TLC5947s.  Modifying the data with bit shifts would be very expensive and there is a lot of data to move.
    byte * scale;   // Scaling factor for LED brightness.  Used when say, you want to be able to adjust the brightness of the powercell without changing all the code that assumes the max desired brightness is 1.0.  Value stored here is scaling_factor*255, where scaling factor is 0..1.
 
    void set(const int which, const float a, const float b)
      {
      // whatever
      }
      
    float get(const int which)
      {
      return 42.0;
      }
    
    
}  // end of namespace myLeds

void setup ()
  {
  myLeds::modules = 2;

  myLeds::data =  (byte *) malloc (myLeds::modules * myLeds::LEDBYTES * sizeof(byte));  
  myLeds::scale = (byte *) malloc (myLeds::modules * myLeds::SCALEBYTES * sizeof(byte));
  
  SPI.begin ();
  Serial.begin (115200);
  
  }  // end of setup
  
void loop ()
  {
  myLeds::set (1, 2, 3);
 
  unsigned long startTime = micros();
      
  byte *thisLED = &myLeds::data [myLeds::modules * myLeds::LEDBYTES]; 
  byte *lastLED = &myLeds::data [0]; 
     
  noInterrupts ();
  
  do
    {       
    SPI.transfer (*--thisLED);
    } while (thisLED != lastLED);

  interrupts ();

  unsigned long endTime = micros();
  Serial.println (endTime - startTime);
  delay (1000);
  }  // end of loop

The serial monitor displays:

216
216
216
220
220
220
220
220
220

That was the predicted time (216 uS) for the fastest SPI loop. So not only does it work, it works nice and fast without mucking around with the SPI registers. Throw in one NOP if necessary if the LEDs require it.

(edit) Changed loop to post-test.

I just discovered that it was the NOP's which were the problem. Somehow switching to the use of a namespace has changed the timing so 10 NOP's is no longer sufficient. I threw 20 in there just to see if that was the issue (WAIT; WAIT;) , and sure enough the LEDs light up once more.

As for your suggestion Nick, I'm afraid there's one problem with your benchmark. The benchmarks I was doing before were for 6 LED modules, not 2. So that SPI function you're calling there is in fact 3x as slow as the theoretical max speed.

I guess I'll have to add or remove some NOPs and see what happens. I probably only need to add one. I had noticed with my earlier testing that removing just one would result in no LEDs lighting, and I could not figure out why that would be, when removing 2 resulted in garbage.

Well thanks for all your help. I guess I must have had the class stuff right in the first place. Ugh. :slight_smile:

Oh well at least I learned about namespaces.

Perhaps if you post all your code? All that namespaces do is change the compiler symbol table management.

All the code wouldn't fit into one post and it uses a modified wavehc and sd card reader lib and 1284 pin definitions and serial output on uart1 with a special bootloader... I don't think you'd want to go to all the trouble to compile it. :slight_smile: Anyway I'm gonna post something in the optimization thread for this spi stuff, I just came across something interesting.

You can make attachments you know. Including .zip files. Click on the Additional Options button below.

I don't think you'd want to go to all the trouble to compile it. :slight_smile:

I've gone to the trouble of writing, compiling and testing the code above with a view to helping you.

Okay, if you really want to take a look at it, here's the code:
http://mightymicrocontroller.com/backups/Mightycode.zip

The board is a 1284p with an SD card on usart0, DAC and TLC5947s on SPI, and serial upload is done via USART1 using a special bootloader. I use the avr-developers variant.
(Crappy setup I know... too late to fix it though.)

The wavehc lib has been modified to use hardware SPI, and the sd lib that comes with it has been modfied to use usart0.

I've got a video of the thing in action too if you want to see what the code is doing. This was with the old slow version of the LED SPI function though:

scswift:
I just discovered that it was the NOP's which were the problem. Somehow switching to the use of a namespace has changed the timing so 10 NOP's is no longer sufficient.

I just tested the code with the NOPs. You are right, it's faster (84 uS). I was wrong about the SPI.transfer.

However looking at the generated code you still got 11 NOPs. The namespace didn't change that at all.

However looking at the generated code you still got 11 NOPs. The namespace didn't change that at all.

I don't understand. The generated code for which version?

With the code just using standard global arrays, 10 NOPs was sufficient. When I added the namespace, 10 stopped working (no leds lit) but 11 works.

I tried adding an extra WAIT before and after the loop as well, but 10 still does not work. Nor does 9.

This version (I believe in making small test cases):

#include <SPI.h>

#define NOP __asm__ __volatile__ ("nop\n");
#define WAIT NOP NOP NOP NOP NOP NOP NOP NOP NOP NOP NOP // 11 NOPs 


namespace myLeds {
   
    const int LEDBYTES = 36;
    const int SCALEBYTES = 24;
    
    int modules;   // Number of LED modules connected.  Defined in config.txt.
  
    byte * data;    // LED data is stored as a string of 12 bit LED values.  There are 3 bytes for every two LEDs.  We store it this way because the data does not need to be altered when we are shifting it out to the TLC5947s.  Modifying the data with bit shifts would be very expensive and there is a lot of data to move.
    byte * scale;   // Scaling factor for LED brightness.  Used when say, you want to be able to adjust the brightness of the powercell without changing all the code that assumes the max desired brightness is 1.0.  Value stored here is scaling_factor*255, where scaling factor is 0..1.
 
    void set(const int which, const float a, const float b)
      {
      // whatever
      }
      
    float get(const int which)
      {
      return 42.0;
      }
      
}  // end of namespace myLeds

void setup ()
  {
  myLeds::modules = 2;

  myLeds::data =  (byte *) malloc (myLeds::modules * myLeds::LEDBYTES * sizeof(byte));  
  myLeds::scale = (byte *) malloc (myLeds::modules * myLeds::SCALEBYTES * sizeof(byte));
  
  SPI.begin ();
  Serial.begin (115200);
  
  }  // end of setup
  
void loop ()
  {
  myLeds::set (1, 2, 3);
 
      
  byte *thisLED = &myLeds::data [myLeds::modules * myLeds::LEDBYTES]; 
  byte *lastLED = &myLeds::data [0]; 
     
  noInterrupts ();
  unsigned long startTime = micros();
  
   do 
     {       
      SPDR = *--thisLED; WAIT;        
      } while (thisLED != lastLED); // thisLED is decremented one last time before we hit the end of the loop, so after byte 0 transfers, the loop exits.
  
  WAIT; // Wait for last byte to finish transfer. 
  SPSR = SPSR & ~_BV(SPIF); // Clear transfer flag.
  unsigned long endTime = micros();

  interrupts ();
  
  
  Serial.println (endTime - startTime);
  
  delay (1000);
  
  }  // end of loop=

That uses your method of assigning to SPDR and doing a WAIT.

Generated code (machine cycles in brackets):

void loop ()
 31c:   cf 92           push    r12   (2)
 31e:   df 92           push    r13   (2)
 320:   ef 92           push    r14   (2)
 322:   ff 92           push    r15   (2)
 324:   0f 93           push    r16   (2)
 326:   1f 93           push    r17   (2)
 328:   cf 93           push    r28   (2)
 32a:   df 93           push    r29   (2)
  {
  myLeds::set (1, 2, 3);
 
      
  byte *thisLED = &myLeds::data [myLeds::modules * myLeds::LEDBYTES]; 
 32c:   00 91 18 01     lds     r16, 0x0118   (2)
 330:   10 91 19 01     lds     r17, 0x0119   (2)
 334:   20 91 16 01     lds     r18, 0x0116   (2)
 338:   30 91 17 01     lds     r19, 0x0117   (2)
 33c:   84 e2           ldi     r24, 0x24       ; 36   (1)
 33e:   90 e0           ldi     r25, 0x00       ; 0   (1)
 340:   28 9f           mul     r18, r24   (2)
 342:   e0 01           movw    r28, r0   (1)
 344:   29 9f           mul     r18, r25   (2)
 346:   d0 0d           add     r29, r0   (1)
 348:   38 9f           mul     r19, r24   (2)
 34a:   d0 0d           add     r29, r0   (1)
 34c:   11 24           eor     r1, r1   (1)
 34e:   c0 0f           add     r28, r16   (1)
 350:   d1 1f           adc     r29, r17   (1)
  byte *lastLED = &myLeds::data [0]; 
     
  noInterrupts ();
 352:   f8 94           cli   (1)
  unsigned long startTime = micros();
 354:   0e 94 7e 02     call    0x4fc   ; 0x4fc <micros>   (4)
 358:   6b 01           movw    r12, r22   (1)
 35a:   7c 01           movw    r14, r24   (1)
  
   do 
     {       
      SPDR = *--thisLED; WAIT;        
 35c:   8a 91           ld      r24, -Y   (2)
 35e:   8e bd           out     0x2e, r24       ; 46   (1)
 360:   00 00           nop   (1)
 362:   00 00           nop   (1)
 364:   00 00           nop   (1)
 366:   00 00           nop   (1)
 368:   00 00           nop   (1)
 36a:   00 00           nop   (1)
 36c:   00 00           nop   (1)
 36e:   00 00           nop   (1)
 370:   00 00           nop   (1)
 372:   00 00           nop   (1)
 374:   00 00           nop   (1)
  byte *lastLED = &myLeds::data [0]; 
     
  noInterrupts ();
  unsigned long startTime = micros();
  
   do 
 376:   c0 17           cp      r28, r16   (1)
 378:   d1 07           cpc     r29, r17   (1)
 37a:   81 f7           brne    .-32            ; 0x35c <loop+0x40>   (1/2)
     {       
      SPDR = *--thisLED; WAIT;        
      } while (thisLED != lastLED); // thisLED is decremented one last time before we hit the end of the loop, so after byte 0 transfers, the loop exits.
  
  WAIT; // Wait for last byte to finish transfer. 
 37c:   00 00           nop   (1)
 37e:   00 00           nop   (1)
 380:   00 00           nop   (1)
 382:   00 00           nop   (1)
 384:   00 00           nop   (1)
 386:   00 00           nop   (1)
 388:   00 00           nop   (1)
 38a:   00 00           nop   (1)
 38c:   00 00           nop   (1)
 38e:   00 00           nop   (1)
 390:   00 00           nop   (1)
  SPSR = SPSR & ~_BV(SPIF); // Clear transfer flag.
 392:   8d b5           in      r24, 0x2d       ; 45   (1)
 394:   8f 77           andi    r24, 0x7F       ; 127   (1)
 396:   8d bd           out     0x2d, r24       ; 45   (1)
  unsigned long endTime = micros();
 398:   0e 94 7e 02     call    0x4fc   ; 0x4fc <micros>   (4)
 39c:   ab 01           movw    r20, r22   (1)
 39e:   bc 01           movw    r22, r24   (1)

  interrupts ();
 3a0:   78 94           sei   (1)

Between addresses 35C and 37A you can see 18 cycles if the branch is taken.

If I take out the namespace it generates identical code. It's not the use of namespace. You must have changed something else.

I can't have changed anything else. I mean what is there that I could change which would change how many nops are needed between bytes in that loop, by one?

Well, I guess I could have changed something... somewhere.. that resulted in that loop being optimized differently. I will disassemble an older 10 nop version of the code and see how the loop differs.

Okay so here's the OLD code that I am running right now, and it's working with 10 nops:

00000624 <_Z10updateLEDsv>:
  static unsigned long nextLEDFrameTime = 0;   
  
  uint8_t *thisLED;
  uint8_t *lastLED;

  if (time >= nextLEDFrameTime) { // If it is time for an LED update...  
     624:	20 91 c4 01 	lds	r18, 0x01C4
     628:	30 91 c5 01 	lds	r19, 0x01C5
     62c:	40 91 c6 01 	lds	r20, 0x01C6
     630:	50 91 c7 01 	lds	r21, 0x01C7
     634:	80 91 a4 02 	lds	r24, 0x02A4
     638:	90 91 a5 02 	lds	r25, 0x02A5
     63c:	a0 91 a6 02 	lds	r26, 0x02A6
     640:	b0 91 a7 02 	lds	r27, 0x02A7
     644:	28 17       	cp	r18, r24
     646:	39 07       	cpc	r19, r25
     648:	4a 07       	cpc	r20, r26
     64a:	5b 07       	cpc	r21, r27
     64c:	08 f4       	brcc	.+2      	; 0x650 <_Z10updateLEDsv+0x2c>
     64e:	31 c0       	rjmp	.+98     	; 0x6b2 <_Z10updateLEDsv+0x8e>
    //unsigned int time = micros();
      
    thisLED = &LEDData[LEDMODULES*36]; // The first operation is to decrement this pointer, so the first time we write thisLED it will be array index LEDMODULES*36-1
    lastLED = &LEDData[0]; 
   
    cli(); // Halt interrupts.
     650:	f8 94       	cli
     652:	ee e9       	ldi	r30, 0x9E	; 158
     654:	f2 e0       	ldi	r31, 0x02	; 2
        
    do {       
    
      SPDR = *--thisLED; WAIT;            
     656:	82 91       	ld	r24, -Z
     658:	8e bd       	out	0x2e, r24	; 46
     65a:	00 00       	nop
     65c:	00 00       	nop
     65e:	00 00       	nop
     660:	00 00       	nop
     662:	00 00       	nop
     664:	00 00       	nop
     666:	00 00       	nop
     668:	00 00       	nop
     66a:	00 00       	nop
     66c:	00 00       	nop
    thisLED = &LEDData[LEDMODULES*36]; // The first operation is to decrement this pointer, so the first time we write thisLED it will be array index LEDMODULES*36-1
    lastLED = &LEDData[0]; 
   
    cli(); // Halt interrupts.
        
    do {       
     66e:	82 e0       	ldi	r24, 0x02	; 2
     670:	e6 35       	cpi	r30, 0x56	; 86
     672:	f8 07       	cpc	r31, r24
     674:	81 f7       	brne	.-32     	; 0x656 <_Z10updateLEDsv+0x32>
      SPDR = *--thisLED; WAIT;            
      //while (!(SPSR & _BV(SPIF))); SPDR = *thisLED--; // Wait for transfer of byte over SPI bus to complete, then transfer *thisLED into SPDR register, and decrement address of *thisLED. 
          
    } while (thisLED != lastLED); // thisLED is decremented one last time before we hit the end of the loop, so after byte 0 transfers, the loop exits.

    WAIT; // Wait for last byte to finish transfer. 
     676:	00 00       	nop
     678:	00 00       	nop
     67a:	00 00       	nop
     67c:	00 00       	nop
     67e:	00 00       	nop
     680:	00 00       	nop
     682:	00 00       	nop
     684:	00 00       	nop
     686:	00 00       	nop
     688:	00 00       	nop
    SPSR = SPSR & ~_BV(SPIF); // Clear transfer flag.
     68a:	8d b5       	in	r24, 0x2d	; 45
     68c:	8f 77       	andi	r24, 0x7F	; 127
     68e:	8d bd       	out	0x2d, r24	; 45

And here is the new code that requires 11:

000007da <_Z10updateLEDsv>:

*/



void updateLEDs() {
     7da:	af 92       	push	r10
     7dc:	bf 92       	push	r11
     7de:	cf 92       	push	r12
     7e0:	df 92       	push	r13
     7e2:	ef 92       	push	r14
     7e4:	ff 92       	push	r15
     7e6:	0f 93       	push	r16
     7e8:	1f 93       	push	r17
  //uint8_t *lastLED;

  byte *thisLED;
  byte *lastLED;

  if (time >= nextLEDFrameTime) { // If it is time for an LED update...  
     7ea:	20 91 c2 01 	lds	r18, 0x01C2
     7ee:	30 91 c3 01 	lds	r19, 0x01C3
     7f2:	40 91 c4 01 	lds	r20, 0x01C4
     7f6:	50 91 c5 01 	lds	r21, 0x01C5
     7fa:	80 91 5e 02 	lds	r24, 0x025E
     7fe:	90 91 5f 02 	lds	r25, 0x025F
     802:	a0 91 60 02 	lds	r26, 0x0260
     806:	b0 91 61 02 	lds	r27, 0x0261
     80a:	28 17       	cp	r18, r24
     80c:	39 07       	cpc	r19, r25
     80e:	4a 07       	cpc	r20, r26
     810:	5b 07       	cpc	r21, r27
     812:	08 f4       	brcc	.+2      	; 0x816 <_Z10updateLEDsv+0x3c>
     814:	57 c0       	rjmp	.+174    	; 0x8c4 <_Z10updateLEDsv+0xea>
  
    unsigned int time = micros();
     816:	0e 94 0b 12 	call	0x2416	; 0x2416 <micros>
     81a:	5b 01       	movw	r10, r22
     81c:	6c 01       	movw	r12, r24
      
    thisLED = &led::data[led::modules*36]; // The first operation is to decrement this pointer, so the first time we write thisLED it will be array index LEDMODULES*36-1
     81e:	40 91 56 02 	lds	r20, 0x0256
     822:	50 91 57 02 	lds	r21, 0x0257
     826:	20 91 54 02 	lds	r18, 0x0254
     82a:	30 91 55 02 	lds	r19, 0x0255
     82e:	84 e2       	ldi	r24, 0x24	; 36
     830:	90 e0       	ldi	r25, 0x00	; 0
     832:	28 9f       	mul	r18, r24
     834:	f0 01       	movw	r30, r0
     836:	29 9f       	mul	r18, r25
     838:	f0 0d       	add	r31, r0
     83a:	38 9f       	mul	r19, r24
     83c:	f0 0d       	add	r31, r0
     83e:	11 24       	eor	r1, r1
     840:	e4 0f       	add	r30, r20
     842:	f5 1f       	adc	r31, r21
    lastLED = &led::data[0]; 
    
    cli(); // Halt interrupts.
     844:	f8 94       	cli

    do {       
      SPDR = *--thisLED; WAIT;        
     846:	82 91       	ld	r24, -Z								2
     848:	8e bd       	out	0x2e, r24	; 46							1
     84a:	00 00       	nop										1
     84c:	00 00       	nop										1
     84e:	00 00       	nop										1
     850:	00 00       	nop										1
     852:	00 00       	nop										1
     854:	00 00       	nop										1
     856:	00 00       	nop										1
     858:	00 00       	nop										1
     85a:	00 00       	nop										1
     85c:	00 00       	nop										1
     85e:	00 00       	nop										1
     860:	e4 17       	cp	r30, r20								1
     862:	f5 07       	cpc	r31, r21								1
     864:	81 f7       	brne	.-32     	; 0x846 <_Z10updateLEDsv+0x6c> 		1/2
      SPDR = *--thisLED; WAIT;        
    } while (thisLED != lastLED); // thisLED is decremented one last time before we hit the end of the loop, so after byte 0 transfers, the loop exits.

    WAIT; // Wait for last byte to finish transfer. 
     866:	00 00       	nop										1
     868:	00 00       	nop										1
     86a:	00 00       	nop										1
     86c:	00 00       	nop										1
     86e:	00 00       	nop										1
     870:	00 00       	nop										1
     872:	00 00       	nop										1
     874:	00 00       	nop										1
     876:	00 00       	nop										1
     878:	00 00       	nop										1
     87a:	00 00       	nop										1
    SPSR = SPSR & ~_BV(SPIF); // Clear transfer flag.
     87c:	8d b5       	in	r24, 0x2d	; 45							1
     87e:	8f 77       	andi	r24, 0x7F	; 127							1
     880:	8d bd       	out	0x2d, r24	; 45							1

Hm, I see one difference right off the bat. But I don't see how it could affect the loop. In the original code LEDMODULES was a constant. Now it is not. So there seems to be a lot more code to set up the pointers.

Can I merge these threads? It's really confusing having the same code discussed in both of them.

OK your 10 NOP one with cycle counts:

    do {       
    
      SPDR = *--thisLED; WAIT;            
     656:       82 91           ld      r24, -Z   (2)
     658:       8e bd           out     0x2e, r24       ; 46   (1)
     65a:       00 00           nop   (1)
     65c:       00 00           nop   (1)
     65e:       00 00           nop   (1)
     660:       00 00           nop   (1)
     662:       00 00           nop   (1)
     664:       00 00           nop   (1)
     666:       00 00           nop   (1)
     668:       00 00           nop   (1)
     66a:       00 00           nop   (1)
     66c:       00 00           nop   (1)
    thisLED = &LEDData[LEDMODULES*36]; // The first operation is to decrement this pointer, so the first time we write thisLED it will be array index LEDMODULES*36-1
    lastLED = &LEDData[0]; 
   
    cli(); // Halt interrupts.
        
    do {       
     66e:       82 e0           ldi     r24, 0x02       ; 2   (1)
     670:       e6 35           cpi     r30, 0x56       ; 86   (1)
     672:       f8 07           cpc     r31, r24   (1)
     674:       81 f7           brne    .-32            ; 0x656 <_Z10updateLEDsv+0x32>   (1/2)

18 when it takes the branch.

And the one with 11 NOPs:

    do {       
      SPDR = *--thisLED; WAIT;        
     846:       82 91           ld      r24, -Z                                                         2   (2)
     848:       8e bd           out     0x2e, r24       ; 46                                               1   (1)
     84a:       00 00           nop                                                                        1   (1)
     84c:       00 00           nop                                                                        1   (1)
     84e:       00 00           nop                                                                        1   (1)
     850:       00 00           nop                                                                        1   (1)
     852:       00 00           nop                                                                        1   (1)
     854:       00 00           nop                                                                        1   (1)
     856:       00 00           nop                                                                        1   (1)
     858:       00 00           nop                                                                        1   (1)
     85a:       00 00           nop                                                                        1   (1)
     85c:       00 00           nop                                                                        1   (1)
     85e:       00 00           nop                                                                        1   (1)
     860:       e4 17           cp      r30, r20                                                           1   (1)
     862:       f5 07           cpc     r31, r21                                                           1   (1)
     864:       81 f7           brne    .-32            ; 0x846 <_Z10updateLEDsv+0x6c>          1/2   (1/2)
      SPDR = *--thisLED; WAIT;        
    } while (thisLED != lastLED); // thisLED is decremented one last time before we hit the end of the loop, so after byte 0 transfers, the loop exits.

18 cycles again. So that totally agrees with what I observed with the logic analyzer in the other thread (or is it this thread?) You need 18 cycles.

I don't mind, though if you can move just some of the posts and leave the class thread stuff alone that would be best for people who have questions about that in the future.