Faster Routine

Hello,

When writing a Code for a LED Matrix I tried a few things like ISR,... the following Code works but is too slow but the fastest I get so I need some advice how to speed this up? SCLK und GCLK are Clocksignals rest are some either Data or Registers I have to set up so I dont think i can do alot of changes in the main-code but maybe there is an option to fasten up the whole thing.

void loop() {
  while (true) {
    //**** GCLK Takt sobald Daten eingeschoben sind, sonst SCLK Takt ****//
    if (j < 577 ) {
      PIOC->PIO_ODSR ^= 0x1 << 21;
      //REG_PIOC_SODR = 0x1 << 21;  //SCLK TLC
    } else {
      PIOD->PIO_ODSR ^= 0x1 << 7;
      //REG_PIOD_SODR = 0x1 << 7;    //GCLK TLC
    }

    //**** DATA OUT ****//
    if (j < 577) {
      //****Daten für obere Hälfte****/
      if (((Tdata[767 - a] >> count) & 0x1)) {
        REG_PIOD_SODR = 0x1 << 8;
      } else {
        REG_PIOD_CODR = 0x1 << 8;
      }
      //****Daten für untere Hälfte****//
      if (((Tdata[1535 - a] >> count) & 0x1)) {
        REG_PIOB_SODR = 0x1 << 27;
      } else {
        REG_PIOB_CODR = 0x1 << 27;
      }
      if (count == 0) {
        count = 11;
        a++;
      } else {
        count--;
      }
    }
    if (j > 545 && j < 578) {
      //****Zeile umschalten ****/
      REG_PIOC_CODR = 0x1 << 25;
      if (Zeilen[Zeile][i]) {
        REG_PIOB_SODR = 0x1 << 25; //SIN 1
      } else {
        REG_PIOB_CODR = 0x1 << 25;  //SIN 0
      }
      i++;
      REG_PIOC_SODR = 0x1 << 25;
    }


    //**** XLAT, GCLK, BLANK & Zeilen umschalten****//
    switch (j) {
      case 576:
        REG_PIOC_SODR = 0x1 << 22;  //XLAT HIGH (Data Latch)
        break;
      case 577:
        REG_PIOB_SODR = 0x1 << 25;       // CLEAR SIN Zeilentreiber
        REG_PIOC_CODR = 0x1 << 25;    //Clear SCLK
        REG_PIOC_SODR = 0x1 << 26;   //RCLK Takt nach jeder neuen Zeile
        i = 0;
        Zeile ++;
        if (Zeile == 16) {
          Zeile = 0;
          a = 0;
        }
        REG_PIOC_CODR = 0x1 << 22;  //XLAT LOW
        REG_PIOC_CODR = 0x1 << 23;  //BLANK LOW (GCLK Counting)
        break;
      case 578:
        REG_PIOC_CODR = 0x1 << 26;       //Clear RCLK
        break;
     //**** 690 - 578 = Anzahl GCLK Takte****//
      case 690:
        REG_PIOC_SODR = 0x1 << 23; //BLANK HIGH
        j = 0;
        break;
      default: break;
    }
    j++;
    /********** Clock-Flanken für schnellere Routine ****************/
    REG_PIOC_CODR = 0x1 << 21;      //SCLK Spalten- & Zeilentreiber
    REG_PIOD_CODR = 0x1 << 7;       //GCLK
  }

}

I get now a clock-frequency of round about 700KHz but the Due I am using has an MCU of 84 MHz so I hope someone has a solution.

EDIT: At the mount I was given there is no option for using SPI so if possible I need a soltuion without using the SPI and using the given PINs. If there is no other option is SPI the faster option?

Sorry I can't understand your code....

If you want to output a square wave, use a PWM (up to 84 MHz), or output PMC_PCK (up to 480 MHz)

What do you measure as a clock signal in your code ?

So my Serial Clock and my Grayscale Clock are the Pins C21 and D7.
My other serial clock is Pin C25.
B27, D8 and C25 are Data Pins.
The rest are some register PINs like Latch etc.

The problem is having this in my while(1) (I toggle the pin twice so low-high in one routine) the Code in between the pin toggling is too slow. Using PWM is not the solution for the problem I guess because I need to do something in between.
So for example:
SCLK is LOW
Set Data pin
SCLK becomes HIGH
Disable Data pin

This is how 1 Bit needs to be shifted out. My LED drivers take data with every high-flank of SCLK.
So when doing PWM or PMC as far as I know I can't shift data at the same speed am I right?

You can set or clear a pin inside a Timer Counter interrupt Handler used to output a PWM signal, but don't set a PWM frequency much higher than 1.5 MHz because the interrupt takes some time to proceed, e.g.:

/*******************************************************************************/
/*                TIOA7 Frequency = 1.5 MHz over pin 3                          */
/*******************************************************************************/

void setup() {
  pinMode(LED_BUILTIN, OUTPUT);
  PMC->PMC_PCER1 |= PMC_PCER1_PID34;                     // TC7 power ON - Timer Counter 2 channel 1 IS TC7 - See page 38

  PIOC->PIO_PDR |= PIO_PDR_P28;                          // The pin is no more driven by GPIO
  PIOC->PIO_ABSR |= PIO_PC28B_TIOA7;                     // Periperal type B  - See page 859

  TC2->TC_CHANNEL[1].TC_CMR = TC_CMR_TCCLKS_TIMER_CLOCK1  // MCK/2, clk on rising edge
                              | TC_CMR_WAVE               // Waveform mode
                              | TC_CMR_WAVSEL_UP_RC        // UP mode with automatic trigger on RC Compare
                              | TC_CMR_ACPA_CLEAR          // Clear TIOA7 on RA compare match  -- See page 883
                              | TC_CMR_ACPC_SET;           // Set TIOA7 on RC compare match

  TC2->TC_CHANNEL[1].TC_RC = 28;  //<*********************  Frequency = (Mck/128)/TC_RC  Hz = 1.5 MHz
  TC2->TC_CHANNEL[1].TC_RA = 14;  //<********************   Duty cycle = (TC_RA/TC_RC) * 100  %

  TC2->TC_CHANNEL[1].TC_IER = TC_IER_CPAS | TC_IER_CPCS;   // Interrupt on RA and RC compare matchs
  NVIC_EnableIRQ(TC7_IRQn);

  TC2->TC_CHANNEL[1].TC_CCR = TC_CCR_SWTRG | TC_CCR_CLKEN; // Software trigger TC7 counter and enable

}

void TC7_Handler() {
  static uint32_t Count;

  uint32_t status = TC2->TC_CHANNEL[1].TC_SR;

  if (status & TC_SR_CPCS) {
    // Todo : Clear a data pin
  }
  else { // if (status & TC_SR_CPA
    // Todo: Set  a data pin
  }

  Count++;
  if (Count == 1500000) {
    Count = 0;
    digitalWrite(LED_BUILTIN, ! digitalRead(LED_BUILTIN));
  }
}
void loop() {

}

I already tried this with the minimum reload in Ra and 50% Duty-Cycle. But it still isn't faster than 700 KHz. I guess it is because of the inner routine which takes too long.
As well when trying the ISR I have the problem with the duty cycles entering and exiting the ISR so this slows it down as well and I don't get any faster.
Also I can't just toggle my Data becasue 1 Data Package is in a range of 12-bit so I need to shift it out plus I dont need onyl the one Data pin but some more pins where I can set/clear registers and another Data Pin where I can select the line I am in.
So imo Timer does not help at all when my processor does not set/clear the registers any faster than in the Code above.
I tried toggling a sole pin in loop like this:

void loop(){
   while(true){
       //Toggle PIN
       PIOC->PIO_ODSR ^= P21;
    }

}

And still only got somthing like 4Mhz so the controller does so many things nearby that it takes 21 times the processor clock frequency to toggle a sole pin !
So with this amount of code and registers I use it surely is possible that I only get a frequency of 700 KHz but what I am asking myself is where does the rest of the processor clock gets lost?
Or do you get a faster response when doing this? So maybe I just need to set some other registers?
For another option I thought about using SPI to transfer data and using a timer to toggle the pin used by the timer without an ISR being used. The other data I need to get some timing but I guess it should work with some global variables/software triggers. But it would take a new mount for the due.

If you can output your data via SPI it is much faster (see TurboSpi Library):

Ethernet2 (UDP) SPI transfers have a lot of dead time - Arduino Due - Arduino Forum reply #36 . This is with the SPI header

or you can use USART0 or USART1 in SPI mode.

So I saw at SPI reference that it is possible to use 3 different PINs as Output (4,10,52) . So what I thought: Is it possbile to write 1152bit to different pins from an array with 12-bit per int like this:

for(int i=0;i<48;i++){ //48*2*12 = 1152

SPI.transfer(4,array[i],sizeof(array[i])); //size is 12 bit
SPI.transfer(10,array[576+i],sizeof(array[i])); //size is 12 bit

}

and using the last SPI pin as a Clock like this:

for(int i = 0;i<Clockticks;i++)
{
SPI.transfer(52,01b;sizeof(2*bit)); //I don't know which is the right type I need the buffersize here
}

So just in theory is this possbile or better just use pmc or timer without isr for getting a simple square wave?
The most important thing would be the parallel data shift.
And when using the TurboSpi can I also use different pins?

EDIT: I already see the problem in the first one it is no real parallel data out. There might also be the option to put data out only on one pin but this needs to happen fast enough so maybe gonna use TurboSPI for this.

Sigh. Lengthy reply wiped out by web page :frowning:

  • please provide example code that compiles and runs.
  • use separate code for each possible value with special actions. Right now, j=576 gets tested for 3 times, and executes 3 code segments.
  • check if "bit-banding" can speed up IO.
 if (((Tdata[767 - a] >> count) & 0x1)) {
        REG_PIOD_SODR = 0x1 << 8;
      } else {
        REG_PIOD_CODR = 0x1 << 8;
      }

Might become:

  PIOBANDEDBITS[8] = Tdata[767 - a] >> count;

with associated less masking and loading of bitshifted constants. I haven't actually ever used bit-banding, because I've never quite seen a case where it would obviously work better, but your code where you're trying to manipulate multiple bits in one Port might be such a case...

Thanks westfw I will definitly try this!
EDIT: Where do I see which Port I use while bit-banding? I only see the word PIO BANDED BITS in the sentence so when trying to access different ports where do I declear this?

So yesterday I was reading through the SPI part on the SAM3x datasheet. Is it possible to do some port manipulation with SPI?
Something like:

SPI0 ->SPI_CR =  SPI_CR_SPIEN ;
SPI0 -> SPI_MR = SPI_MR_MSTR

and so on? Becasue I have seen on page 703 that I can select how many Bits I want to transfer this would definitly make things easier if I could do SPI_CSR1 = SPI_CSR_BITS_12_BIT.

When trying to compile this code it doens't work so how do I program it the right way?

An example of SPI_CSR programming:

// Data transfer parameters
SPI0->SPI_CSR[0] = SPI_CSR_CPOL          // Inactive state value of SPCK is logic level one
                    | SPI_CSR_NCPHA       // Data is captured on the leading edge of SPCK and changed on the following edge
                    | SPI_CSR_CSNAAT       // Chip select active after transfer
                    | SPI_CSR_BITS_12_BIT // Bits per transfer
                    | SPI_CSR_SCBR(100)   // bit rate
                    | SPI_CSR_DLYBS(100); // delay from NPCS falling edge (activation)

Where do I see which Port I use while bit-banding?

Ah. I didn't notice that you had bits on separate PORTs. That will make bit-banding less attractive :frowning:
if you wanted to access "xxxx_bits[bitnumber]", you'd need a separate definition for each port.

This sample looks like it creates the correct object code; it's not actually tested, and I'll caution again that I haven't actually tried to use bit-banding before...

#include <sam.h>
// create an address in the bitband region for a single register in IO space.
// This can then be accessed as an array indexed by bit number (for example)
#define BITBAND_REG(addr) ((volatile int*)(((unsigned int)addr - 0x40000000)*32 + 0x42000000))

// define bit-accessible aliases for the PIO Output data register.
// (don't forget that you need to fiddle with "PIO Output Write Enable Register")
#define PORTA_BB_BITS  BITBAND_REG(&PIOA->PIO_ODSR)
#define PORTB_BB_BITS  BITBAND_REG(&PIOB->PIO_ODSR)
#define PORTC_BB_BITS  BITBAND_REG(&PIOC->PIO_ODSR)

int main() {
    PORTA_BB_BITS[17] = 1;
    if (PORTB_BB_BITS[5]) {
    PORTC_BB_BITS[3] = 1;
    }
}
    PORTA_BB_BITS[17] = 1;
   0:   4a05            ldr     r2, [pc, #20]   ; (18 <main+0x18>)
   2:   2301            movs    r3, #1
   4:   6013            str     r3, [r2, #0]
    if (PORTB_BB_BITS[5]) {
   6:   f502 527f       add.w   r2, r2, #16320  ; 0x3fc0
   a:   3210            adds    r2, #16
   c:   6812            ldr     r2, [r2, #0]
   e:   b10a            cbz     r2, 14 <main+0x14>
        PORTC_BB_BITS[3] = 1;
  10:   4a02            ldr     r2, [pc, #8]    ; (1c <main+0x1c>)
  12:   6013            str     r3, [r2, #0]
    }
}
  14:   2000            movs    r0, #0
  16:   4770            bx      lr
  18:   43c1c744        .word   0x43c1c744
  1c:   43c2470c        .word   0x43c2470c

People on the ARM community forums note that bit-banding doesn't seem to have been very successful. Assorted compiler extensions haven't gotten implemented and the feature is not present on M0 or M7 chips :frowning:
https://community.arm.com/tools/f/discussions/7142/ld-scripts-is-the-best-approach-for-bitbanding-with-gcc
https://community.arm.com/tools/f/discussions/8856/arm-compiler-6-clang-and-bit-banding

You can toggle a pin a bit faster than you did in reply #4:

void setup() {
  
  PIOC->PIO_OWER = PIO_OWER_P21;
}

void loop() {
  while (true) {
    //Toggle PIN
    PIOC->PIO_ODSR = 1 << 21;
    PIOC->PIO_ODSR = 0 << 21;
  }

}

In addition to SPI, the SAM3X also has I2S via its SCC peripheral. (and other synchronous protocols.) I know almost nothing about I2S and SCC, but the ESP8266/ESP32 folks are doing "interesting" hi-speed stuff with those chips' i2s interface... Color TV Broadcasts Are ESP8266’s Newest Trick | Hackaday

So I took a look over my program and I calculated something.
So when I get a 12.5MHz or higher square wave on PIN 11 (PD7) the program would work fine.
I was looking at the PWM and PMC registers but they unfortunalty don't support PD7 as far as I understand.
So is there an option to still toggle that PIN with a frequncy > 12.5MHz?

PD7 is TIOA8 (arduino pin 11).

Set Timer Counter 2 channel 2 (TC8) with the correct frequency and TIOA8 will toggle up to 21 MHz.