It was a constant at one point, and it still is, but eventually it will be loaded from a file, so yeah I will change that.
namespaces don't have semicolons at the end; might want to fix that (although it still compiles)
This compiles. It appears to work:
#include <SPI.h>
namespace myLeds
{
const int LEDBYTES = 36;
const int SCALEBYTES = 24;
int modules; // Number of LED modules connected. Defined in config.txt.
byte * data; // LED data is stored as a string of 12 bit LED values. There are 3 bytes for every two LEDs. We store it this way because the data does not need to be altered when we are shifting it out to the TLC5947s. Modifying the data with bit shifts would be very expensive and there is a lot of data to move.
byte * scale; // Scaling factor for LED brightness. Used when say, you want to be able to adjust the brightness of the powercell without changing all the code that assumes the max desired brightness is 1.0. Value stored here is scaling_factor*255, where scaling factor is 0..1.
void set(const int which, const float a, const float b)
{
// whatever
}
float get(const int which)
{
return 42.0;
}
} // end of namespace myLeds
void setup ()
{
myLeds::modules = 2;
myLeds::data = (byte *) malloc (myLeds::modules * myLeds::LEDBYTES * sizeof(byte));
myLeds::scale = (byte *) malloc (myLeds::modules * myLeds::SCALEBYTES * sizeof(byte));
SPI.begin ();
Serial.begin (115200);
} // end of setup
void loop ()
{
myLeds::set (1, 2, 3);
unsigned long startTime = micros();
byte *thisLED = &myLeds::data [myLeds::modules * myLeds::LEDBYTES];
byte *lastLED = &myLeds::data [0];
noInterrupts ();
do
{
SPI.transfer (*--thisLED);
} while (thisLED != lastLED);
interrupts ();
unsigned long endTime = micros();
Serial.println (endTime - startTime);
delay (1000);
} // end of loop
The serial monitor displays:
216
216
216
220
220
220
220
220
220
That was the predicted time (216 uS) for the fastest SPI loop. So not only does it work, it works nice and fast without mucking around with the SPI registers. Throw in one NOP if necessary if the LEDs require it.
(edit) Changed loop to post-test.
I just discovered that it was the NOP's which were the problem. Somehow switching to the use of a namespace has changed the timing so 10 NOP's is no longer sufficient. I threw 20 in there just to see if that was the issue (WAIT; WAIT;) , and sure enough the LEDs light up once more.
As for your suggestion Nick, I'm afraid there's one problem with your benchmark. The benchmarks I was doing before were for 6 LED modules, not 2. So that SPI function you're calling there is in fact 3x as slow as the theoretical max speed.
I guess I'll have to add or remove some NOPs and see what happens. I probably only need to add one. I had noticed with my earlier testing that removing just one would result in no LEDs lighting, and I could not figure out why that would be, when removing 2 resulted in garbage.
Well thanks for all your help. I guess I must have had the class stuff right in the first place. Ugh.
Oh well at least I learned about namespaces.
Perhaps if you post all your code? All that namespaces do is change the compiler symbol table management.
All the code wouldn't fit into one post and it uses a modified wavehc and sd card reader lib and 1284 pin definitions and serial output on uart1 with a special bootloader... I don't think you'd want to go to all the trouble to compile it. Anyway I'm gonna post something in the optimization thread for this spi stuff, I just came across something interesting.
You can make attachments you know. Including .zip files. Click on the Additional Options button below.
I don't think you'd want to go to all the trouble to compile it.
I've gone to the trouble of writing, compiling and testing the code above with a view to helping you.
Okay, if you really want to take a look at it, here's the code:
http://mightymicrocontroller.com/backups/Mightycode.zip
The board is a 1284p with an SD card on usart0, DAC and TLC5947s on SPI, and serial upload is done via USART1 using a special bootloader. I use the avr-developers variant.
(Crappy setup I know... too late to fix it though.)
The wavehc lib has been modified to use hardware SPI, and the sd lib that comes with it has been modfied to use usart0.
I've got a video of the thing in action too if you want to see what the code is doing. This was with the old slow version of the LED SPI function though:
scswift:
I just discovered that it was the NOP's which were the problem. Somehow switching to the use of a namespace has changed the timing so 10 NOP's is no longer sufficient.
I just tested the code with the NOPs. You are right, it's faster (84 uS). I was wrong about the SPI.transfer.
However looking at the generated code you still got 11 NOPs. The namespace didn't change that at all.
However looking at the generated code you still got 11 NOPs. The namespace didn't change that at all.
I don't understand. The generated code for which version?
With the code just using standard global arrays, 10 NOPs was sufficient. When I added the namespace, 10 stopped working (no leds lit) but 11 works.
I tried adding an extra WAIT before and after the loop as well, but 10 still does not work. Nor does 9.
This version (I believe in making small test cases):
#include <SPI.h>
#define NOP __asm__ __volatile__ ("nop\n");
#define WAIT NOP NOP NOP NOP NOP NOP NOP NOP NOP NOP NOP // 11 NOPs
namespace myLeds {
const int LEDBYTES = 36;
const int SCALEBYTES = 24;
int modules; // Number of LED modules connected. Defined in config.txt.
byte * data; // LED data is stored as a string of 12 bit LED values. There are 3 bytes for every two LEDs. We store it this way because the data does not need to be altered when we are shifting it out to the TLC5947s. Modifying the data with bit shifts would be very expensive and there is a lot of data to move.
byte * scale; // Scaling factor for LED brightness. Used when say, you want to be able to adjust the brightness of the powercell without changing all the code that assumes the max desired brightness is 1.0. Value stored here is scaling_factor*255, where scaling factor is 0..1.
void set(const int which, const float a, const float b)
{
// whatever
}
float get(const int which)
{
return 42.0;
}
} // end of namespace myLeds
void setup ()
{
myLeds::modules = 2;
myLeds::data = (byte *) malloc (myLeds::modules * myLeds::LEDBYTES * sizeof(byte));
myLeds::scale = (byte *) malloc (myLeds::modules * myLeds::SCALEBYTES * sizeof(byte));
SPI.begin ();
Serial.begin (115200);
} // end of setup
void loop ()
{
myLeds::set (1, 2, 3);
byte *thisLED = &myLeds::data [myLeds::modules * myLeds::LEDBYTES];
byte *lastLED = &myLeds::data [0];
noInterrupts ();
unsigned long startTime = micros();
do
{
SPDR = *--thisLED; WAIT;
} while (thisLED != lastLED); // thisLED is decremented one last time before we hit the end of the loop, so after byte 0 transfers, the loop exits.
WAIT; // Wait for last byte to finish transfer.
SPSR = SPSR & ~_BV(SPIF); // Clear transfer flag.
unsigned long endTime = micros();
interrupts ();
Serial.println (endTime - startTime);
delay (1000);
} // end of loop=
That uses your method of assigning to SPDR and doing a WAIT.
Generated code (machine cycles in brackets):
void loop ()
31c: cf 92 push r12 (2)
31e: df 92 push r13 (2)
320: ef 92 push r14 (2)
322: ff 92 push r15 (2)
324: 0f 93 push r16 (2)
326: 1f 93 push r17 (2)
328: cf 93 push r28 (2)
32a: df 93 push r29 (2)
{
myLeds::set (1, 2, 3);
byte *thisLED = &myLeds::data [myLeds::modules * myLeds::LEDBYTES];
32c: 00 91 18 01 lds r16, 0x0118 (2)
330: 10 91 19 01 lds r17, 0x0119 (2)
334: 20 91 16 01 lds r18, 0x0116 (2)
338: 30 91 17 01 lds r19, 0x0117 (2)
33c: 84 e2 ldi r24, 0x24 ; 36 (1)
33e: 90 e0 ldi r25, 0x00 ; 0 (1)
340: 28 9f mul r18, r24 (2)
342: e0 01 movw r28, r0 (1)
344: 29 9f mul r18, r25 (2)
346: d0 0d add r29, r0 (1)
348: 38 9f mul r19, r24 (2)
34a: d0 0d add r29, r0 (1)
34c: 11 24 eor r1, r1 (1)
34e: c0 0f add r28, r16 (1)
350: d1 1f adc r29, r17 (1)
byte *lastLED = &myLeds::data [0];
noInterrupts ();
352: f8 94 cli (1)
unsigned long startTime = micros();
354: 0e 94 7e 02 call 0x4fc ; 0x4fc <micros> (4)
358: 6b 01 movw r12, r22 (1)
35a: 7c 01 movw r14, r24 (1)
do
{
SPDR = *--thisLED; WAIT;
35c: 8a 91 ld r24, -Y (2)
35e: 8e bd out 0x2e, r24 ; 46 (1)
360: 00 00 nop (1)
362: 00 00 nop (1)
364: 00 00 nop (1)
366: 00 00 nop (1)
368: 00 00 nop (1)
36a: 00 00 nop (1)
36c: 00 00 nop (1)
36e: 00 00 nop (1)
370: 00 00 nop (1)
372: 00 00 nop (1)
374: 00 00 nop (1)
byte *lastLED = &myLeds::data [0];
noInterrupts ();
unsigned long startTime = micros();
do
376: c0 17 cp r28, r16 (1)
378: d1 07 cpc r29, r17 (1)
37a: 81 f7 brne .-32 ; 0x35c <loop+0x40> (1/2)
{
SPDR = *--thisLED; WAIT;
} while (thisLED != lastLED); // thisLED is decremented one last time before we hit the end of the loop, so after byte 0 transfers, the loop exits.
WAIT; // Wait for last byte to finish transfer.
37c: 00 00 nop (1)
37e: 00 00 nop (1)
380: 00 00 nop (1)
382: 00 00 nop (1)
384: 00 00 nop (1)
386: 00 00 nop (1)
388: 00 00 nop (1)
38a: 00 00 nop (1)
38c: 00 00 nop (1)
38e: 00 00 nop (1)
390: 00 00 nop (1)
SPSR = SPSR & ~_BV(SPIF); // Clear transfer flag.
392: 8d b5 in r24, 0x2d ; 45 (1)
394: 8f 77 andi r24, 0x7F ; 127 (1)
396: 8d bd out 0x2d, r24 ; 45 (1)
unsigned long endTime = micros();
398: 0e 94 7e 02 call 0x4fc ; 0x4fc <micros> (4)
39c: ab 01 movw r20, r22 (1)
39e: bc 01 movw r22, r24 (1)
interrupts ();
3a0: 78 94 sei (1)
Between addresses 35C and 37A you can see 18 cycles if the branch is taken.
If I take out the namespace it generates identical code. It's not the use of namespace. You must have changed something else.
I can't have changed anything else. I mean what is there that I could change which would change how many nops are needed between bytes in that loop, by one?
Well, I guess I could have changed something... somewhere.. that resulted in that loop being optimized differently. I will disassemble an older 10 nop version of the code and see how the loop differs.
Okay so here's the OLD code that I am running right now, and it's working with 10 nops:
00000624 <_Z10updateLEDsv>:
static unsigned long nextLEDFrameTime = 0;
uint8_t *thisLED;
uint8_t *lastLED;
if (time >= nextLEDFrameTime) { // If it is time for an LED update...
624: 20 91 c4 01 lds r18, 0x01C4
628: 30 91 c5 01 lds r19, 0x01C5
62c: 40 91 c6 01 lds r20, 0x01C6
630: 50 91 c7 01 lds r21, 0x01C7
634: 80 91 a4 02 lds r24, 0x02A4
638: 90 91 a5 02 lds r25, 0x02A5
63c: a0 91 a6 02 lds r26, 0x02A6
640: b0 91 a7 02 lds r27, 0x02A7
644: 28 17 cp r18, r24
646: 39 07 cpc r19, r25
648: 4a 07 cpc r20, r26
64a: 5b 07 cpc r21, r27
64c: 08 f4 brcc .+2 ; 0x650 <_Z10updateLEDsv+0x2c>
64e: 31 c0 rjmp .+98 ; 0x6b2 <_Z10updateLEDsv+0x8e>
//unsigned int time = micros();
thisLED = &LEDData[LEDMODULES*36]; // The first operation is to decrement this pointer, so the first time we write thisLED it will be array index LEDMODULES*36-1
lastLED = &LEDData[0];
cli(); // Halt interrupts.
650: f8 94 cli
652: ee e9 ldi r30, 0x9E ; 158
654: f2 e0 ldi r31, 0x02 ; 2
do {
SPDR = *--thisLED; WAIT;
656: 82 91 ld r24, -Z
658: 8e bd out 0x2e, r24 ; 46
65a: 00 00 nop
65c: 00 00 nop
65e: 00 00 nop
660: 00 00 nop
662: 00 00 nop
664: 00 00 nop
666: 00 00 nop
668: 00 00 nop
66a: 00 00 nop
66c: 00 00 nop
thisLED = &LEDData[LEDMODULES*36]; // The first operation is to decrement this pointer, so the first time we write thisLED it will be array index LEDMODULES*36-1
lastLED = &LEDData[0];
cli(); // Halt interrupts.
do {
66e: 82 e0 ldi r24, 0x02 ; 2
670: e6 35 cpi r30, 0x56 ; 86
672: f8 07 cpc r31, r24
674: 81 f7 brne .-32 ; 0x656 <_Z10updateLEDsv+0x32>
SPDR = *--thisLED; WAIT;
//while (!(SPSR & _BV(SPIF))); SPDR = *thisLED--; // Wait for transfer of byte over SPI bus to complete, then transfer *thisLED into SPDR register, and decrement address of *thisLED.
} while (thisLED != lastLED); // thisLED is decremented one last time before we hit the end of the loop, so after byte 0 transfers, the loop exits.
WAIT; // Wait for last byte to finish transfer.
676: 00 00 nop
678: 00 00 nop
67a: 00 00 nop
67c: 00 00 nop
67e: 00 00 nop
680: 00 00 nop
682: 00 00 nop
684: 00 00 nop
686: 00 00 nop
688: 00 00 nop
SPSR = SPSR & ~_BV(SPIF); // Clear transfer flag.
68a: 8d b5 in r24, 0x2d ; 45
68c: 8f 77 andi r24, 0x7F ; 127
68e: 8d bd out 0x2d, r24 ; 45
And here is the new code that requires 11:
000007da <_Z10updateLEDsv>:
*/
void updateLEDs() {
7da: af 92 push r10
7dc: bf 92 push r11
7de: cf 92 push r12
7e0: df 92 push r13
7e2: ef 92 push r14
7e4: ff 92 push r15
7e6: 0f 93 push r16
7e8: 1f 93 push r17
//uint8_t *lastLED;
byte *thisLED;
byte *lastLED;
if (time >= nextLEDFrameTime) { // If it is time for an LED update...
7ea: 20 91 c2 01 lds r18, 0x01C2
7ee: 30 91 c3 01 lds r19, 0x01C3
7f2: 40 91 c4 01 lds r20, 0x01C4
7f6: 50 91 c5 01 lds r21, 0x01C5
7fa: 80 91 5e 02 lds r24, 0x025E
7fe: 90 91 5f 02 lds r25, 0x025F
802: a0 91 60 02 lds r26, 0x0260
806: b0 91 61 02 lds r27, 0x0261
80a: 28 17 cp r18, r24
80c: 39 07 cpc r19, r25
80e: 4a 07 cpc r20, r26
810: 5b 07 cpc r21, r27
812: 08 f4 brcc .+2 ; 0x816 <_Z10updateLEDsv+0x3c>
814: 57 c0 rjmp .+174 ; 0x8c4 <_Z10updateLEDsv+0xea>
unsigned int time = micros();
816: 0e 94 0b 12 call 0x2416 ; 0x2416 <micros>
81a: 5b 01 movw r10, r22
81c: 6c 01 movw r12, r24
thisLED = &led::data[led::modules*36]; // The first operation is to decrement this pointer, so the first time we write thisLED it will be array index LEDMODULES*36-1
81e: 40 91 56 02 lds r20, 0x0256
822: 50 91 57 02 lds r21, 0x0257
826: 20 91 54 02 lds r18, 0x0254
82a: 30 91 55 02 lds r19, 0x0255
82e: 84 e2 ldi r24, 0x24 ; 36
830: 90 e0 ldi r25, 0x00 ; 0
832: 28 9f mul r18, r24
834: f0 01 movw r30, r0
836: 29 9f mul r18, r25
838: f0 0d add r31, r0
83a: 38 9f mul r19, r24
83c: f0 0d add r31, r0
83e: 11 24 eor r1, r1
840: e4 0f add r30, r20
842: f5 1f adc r31, r21
lastLED = &led::data[0];
cli(); // Halt interrupts.
844: f8 94 cli
do {
SPDR = *--thisLED; WAIT;
846: 82 91 ld r24, -Z 2
848: 8e bd out 0x2e, r24 ; 46 1
84a: 00 00 nop 1
84c: 00 00 nop 1
84e: 00 00 nop 1
850: 00 00 nop 1
852: 00 00 nop 1
854: 00 00 nop 1
856: 00 00 nop 1
858: 00 00 nop 1
85a: 00 00 nop 1
85c: 00 00 nop 1
85e: 00 00 nop 1
860: e4 17 cp r30, r20 1
862: f5 07 cpc r31, r21 1
864: 81 f7 brne .-32 ; 0x846 <_Z10updateLEDsv+0x6c> 1/2
SPDR = *--thisLED; WAIT;
} while (thisLED != lastLED); // thisLED is decremented one last time before we hit the end of the loop, so after byte 0 transfers, the loop exits.
WAIT; // Wait for last byte to finish transfer.
866: 00 00 nop 1
868: 00 00 nop 1
86a: 00 00 nop 1
86c: 00 00 nop 1
86e: 00 00 nop 1
870: 00 00 nop 1
872: 00 00 nop 1
874: 00 00 nop 1
876: 00 00 nop 1
878: 00 00 nop 1
87a: 00 00 nop 1
SPSR = SPSR & ~_BV(SPIF); // Clear transfer flag.
87c: 8d b5 in r24, 0x2d ; 45 1
87e: 8f 77 andi r24, 0x7F ; 127 1
880: 8d bd out 0x2d, r24 ; 45 1
Hm, I see one difference right off the bat. But I don't see how it could affect the loop. In the original code LEDMODULES was a constant. Now it is not. So there seems to be a lot more code to set up the pointers.
Can I merge these threads? It's really confusing having the same code discussed in both of them.
OK your 10 NOP one with cycle counts:
do {
SPDR = *--thisLED; WAIT;
656: 82 91 ld r24, -Z (2)
658: 8e bd out 0x2e, r24 ; 46 (1)
65a: 00 00 nop (1)
65c: 00 00 nop (1)
65e: 00 00 nop (1)
660: 00 00 nop (1)
662: 00 00 nop (1)
664: 00 00 nop (1)
666: 00 00 nop (1)
668: 00 00 nop (1)
66a: 00 00 nop (1)
66c: 00 00 nop (1)
thisLED = &LEDData[LEDMODULES*36]; // The first operation is to decrement this pointer, so the first time we write thisLED it will be array index LEDMODULES*36-1
lastLED = &LEDData[0];
cli(); // Halt interrupts.
do {
66e: 82 e0 ldi r24, 0x02 ; 2 (1)
670: e6 35 cpi r30, 0x56 ; 86 (1)
672: f8 07 cpc r31, r24 (1)
674: 81 f7 brne .-32 ; 0x656 <_Z10updateLEDsv+0x32> (1/2)
18 when it takes the branch.
And the one with 11 NOPs:
do {
SPDR = *--thisLED; WAIT;
846: 82 91 ld r24, -Z 2 (2)
848: 8e bd out 0x2e, r24 ; 46 1 (1)
84a: 00 00 nop 1 (1)
84c: 00 00 nop 1 (1)
84e: 00 00 nop 1 (1)
850: 00 00 nop 1 (1)
852: 00 00 nop 1 (1)
854: 00 00 nop 1 (1)
856: 00 00 nop 1 (1)
858: 00 00 nop 1 (1)
85a: 00 00 nop 1 (1)
85c: 00 00 nop 1 (1)
85e: 00 00 nop 1 (1)
860: e4 17 cp r30, r20 1 (1)
862: f5 07 cpc r31, r21 1 (1)
864: 81 f7 brne .-32 ; 0x846 <_Z10updateLEDsv+0x6c> 1/2 (1/2)
SPDR = *--thisLED; WAIT;
} while (thisLED != lastLED); // thisLED is decremented one last time before we hit the end of the loop, so after byte 0 transfers, the loop exits.
18 cycles again. So that totally agrees with what I observed with the logic analyzer in the other thread (or is it this thread?) You need 18 cycles.
I don't mind, though if you can move just some of the posts and leave the class thread stuff alone that would be best for people who have questions about that in the future.