Due SAM3X8E SysTick Problem

:frowning:
I have run SystTick code before and later 8000 times port reading.
Number of ticks for 8000 reading is 24008ticks.(one tick 11.9ns) which means 28M reading per second.
After finishing Readingvalue process
I have added a for_loop for sending values of array .
But this time number ticks increases approximately 80 000ticks.
Why number of systick increase ?
Code line of client.write is under reading process!

But it effects clock cycles of readings.
I don't understand this behaviour of MCU.
and decrease the speed of reading from28M to 10.5M reading per second.

 EthernetClient client = server.available();
uint32_t X[8000]; // X serisini kütüphanelerin tanımlarının yapıldığı genel değişkenler kısmına yapınca bir okuma işlemi 24clk alıyor. Burada tanımlanınca 3clk oluyor.:)
   uint32_t i,t0,t1,*a=&X[0];

  elapsed_Time = 0; // for a girmeden önce geçen süre sıfırla



 
  noInterrupts();
  // PIO_PDSR&0xFF =Digital ports of ARDUINO DUE :(MSB)D11 D14 D15 D25 D26 D27 D28 D29(LSB) 
  
   t0=SysTick->VAL; 
     

 
  T(T(T(*a++ = PIOD->PIO_PDSR&0XFF)))// 1000 times readig 
 



   t1=SysTick->VAL; //  to-t1=24008ticks without T(T(T( client.write(*a++))))

   a=&X[0];//pointer takes the address of first member of array
   T(T(T( client.write(*a++))))// with sending number of ticks approximately 80000 >:( 


  interrupts();
   
  
  Serial.print("n_ticks: ");
  Serial.println( ((t0<t1)?84000+t0:t0)-t1 );
  Serial.println();

  
  delay(30);

By default, the SysTick Timer Counter counts down from 84000 to 0 in 1 ms, then reloads and so on.

I bet that your readings take more than 1 ms and SysTick can't handle more than 1 ms (by default).

Anyway, the right process to read Systick -> VAL should be this one:

volatile uint32_t t0, t1, t3;
void setup() {

  Serial.begin(250000);
  
}
void loop() {
  noInterrupts();

  t0 = SysTick->VAL;

  // Do some stuff during less than 1 ms, e.g. :
  delayMicroseconds(5);

  t1 = SysTick->VAL;
  t3 = ((t0 < t1) ? 84000 + t0 : t0) - t1 - 2 ;

  interrupts();

  Serial.print("number of ticks: ");
  Serial.println(t3  );
  delay(1000);
}

Thanks
I don't evaluate SysTick in terms of 1ms.
I have tested SysTick for 15 times reading.
Result was sense :
first reading 5tick +14*3tick=47ticks
and elapsed time for reading is small enough for 1ms.
I added serial.println after finished reading.
The same reading process increase 121tick.??

volatile uint32_t t0, t1, t3;
void setup() {

  Serial.begin(250000);
  
}
void loop() {
   uint32_t i,A[2000],t0,t1,*a=&A[0];
  noInterrupts();

  t0 = SysTick->VAL;

  // Do some stuff during less than 1 ms, e.g. :
  *a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111;//5ticks
  *a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111;//3ticks
  *a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111;//3ticks
  *a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111;//3ticks
  *a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111;//3ticks
  *a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111;//3ticks
  *a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111;//3ticks
  *a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111;//3ticks
  *a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111;//3ticks
  *a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111;//3ticks
  *a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111;//3ticks
  *a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111;//3ticks
  *a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111;//3ticks
  *a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111;//3ticks
  *a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111;//3ticks
  //total=5+3*14=47ticks
  t1 = SysTick->VAL;
  t3 = ((t0 < t1) ? 84000 + t0 : t0) - t1 - 2 ;

  interrupts();
for(i=0;i<14;i++){Serial.println(A[i] );}
  Serial.print("number of ticks: ");
  Serial.println(t3  );
  delay(1000);
}

This issue is the result of the compiler optimization, you could see that in the assembler file of your sketch.

A workaround to have no difference with or without the Serial printing of the final array would be to write an assembler routine to log PIOD->PIO_PDSR; or use a DMA list link item from &PIOD->PIO_PDSR to an array of uint32_t. The second solution is slower if you log only a few values.

With the below sketch the result is 47 or 91, whether you Serial print or not the final array:

void setup() {

  Serial.begin(250000);

}
void loop() {

  uint32_t t0, t1, t3;
  uint32_t A[2000];
  uint8_t Index;
  Index = 0;
  noInterrupts();
  t0 = SysTick->VAL;

  A[Index++] = PIOD->PIO_PDSR;
  A[Index++] = PIOD->PIO_PDSR;
  A[Index++] = PIOD->PIO_PDSR;
  A[Index++] = PIOD->PIO_PDSR;
  A[Index++] = PIOD->PIO_PDSR;

  A[Index++] = PIOD->PIO_PDSR;
  A[Index++] = PIOD->PIO_PDSR;
  A[Index++] = PIOD->PIO_PDSR;
  A[Index++] = PIOD->PIO_PDSR;
  A[Index++] = PIOD->PIO_PDSR;

  A[Index++] = PIOD->PIO_PDSR;
  A[Index++] = PIOD->PIO_PDSR;
  A[Index++] = PIOD->PIO_PDSR;
  A[Index++] = PIOD->PIO_PDSR;
  A[Index++] = PIOD->PIO_PDSR;

  t1 = SysTick->VAL;
  t3 = ((t0 < t1) ? 84000 + t0 : t0) - t1 - 2 ;
  interrupts();
  Serial.print("number of ticks: ");
  Serial.println(t3);
  
  for (int i = 0; i < 15; i++) {

    Serial.println((uint8_t)A[i]);
  }

  delay(1000);

}

I assigned readed values to a byte array .
I printed new byte array of readed values .
But result is increased number of ticks. :frowning:

This issue is the result of the compiler optimization, you could see that in the assembler file of your sketch.

As newbie said, this is the compiler "optimizing" your code. When you do not actually use the contents of the array, the compiler notices, and does not bother storing the data read from the port (it still has to READ from the port, as required by the "volatile" definitions somewhere deep inside the CMSIS .h files.):

00080148 <loop>:
   80148:       b508            push    {r3, lr}
  This function disables IRQ interrupts by setting the I-bit in the CPSR.
  Can only be executed in Privileged modes.
 */
__attribute__( ( always_inline ) ) static __INLINE void __disable_irq(void)
{
  __ASM volatile ("cpsid i");
   8014a:       b672            cpsid   i
   8014c:       4b18            ldr     r3, [pc, #96]   ; (801b0 <loop+0x68>)
   8014e:       4919            ldr     r1, [pc, #100]  ; (801b4 <loop+0x6c>)
   80150:       688a            ldr     r2, [r1, #8]
   80152:       6bd8            ldr     r0, [r3, #60]   ; 0x3c
   80154:       6bd8            ldr     r0, [r3, #60]   ; 0x3c
   80156:       6bd8            ldr     r0, [r3, #60]   ; 0x3c
   80158:       6bd8            ldr     r0, [r3, #60]   ; 0x3c
   8015a:       6bd8            ldr     r0, [r3, #60]   ; 0x3c
   8015c:       6bd8            ldr     r0, [r3, #60]   ; 0x3c
   8015e:       6bd8            ldr     r0, [r3, #60]   ; 0x3c
   80160:       6bd8            ldr     r0, [r3, #60]   ; 0x3c
   80162:       6bd8            ldr     r0, [r3, #60]   ; 0x3c
   80164:       6bd8            ldr     r0, [r3, #60]   ; 0x3c
   80166:       6bd8            ldr     r0, [r3, #60]   ; 0x3c
   80168:       6bd8            ldr     r0, [r3, #60]   ; 0x3c
   8016a:       6bd8            ldr     r0, [r3, #60]   ; 0x3c
   8016c:       6bd8            ldr     r0, [r3, #60]   ; 0x3c
   8016e:       6bdb            ldr     r3, [r3, #60]   ; 0x3c
   80170:       6889            ldr     r1, [r1, #8]
   80172:       428a            cmp     r2, r1
   80174:       bf34            ite     cc

When you print the array, it can't optimize any more:

00080148 <loop>:
   80148:       b510            push    {r4, lr}
   8014a:       f5ad 5dfa       sub.w   sp, sp, #8000   ; 0x1f40
  This function disables IRQ interrupts by setting the I-bit in the CPSR.
  Can only be executed in Privileged modes.
 */
__attribute__( ( always_inline ) ) static __INLINE void __disable_irq(void)
{
  __ASM volatile ("cpsid i");
   8014e:       b672            cpsid   i
   80150:       4b2d            ldr     r3, [pc, #180]  ; (80208 <loop+0xc0>)
   80152:       492e            ldr     r1, [pc, #184]  ; (8020c <loop+0xc4>)
   80154:       688a            ldr     r2, [r1, #8]
   80156:       6bd8            ldr     r0, [r3, #60]   ; 0x3c
   80158:       b2c0            uxtb    r0, r0
   8015a:       9000            str     r0, [sp, #0]
   8015c:       6bd8            ldr     r0, [r3, #60]   ; 0x3c
   8015e:       b2c0            uxtb    r0, r0
   80160:       9001            str     r0, [sp, #4]
   80162:       6bd8            ldr     r0, [r3, #60]   ; 0x3c
   80164:       b2c0            uxtb    r0, r0
   80166:       9002            str     r0, [sp, #8]
   80168:       6bd8            ldr     r0, [r3, #60]   ; 0x3c
   8016a:       b2c0            uxtb    r0, r0
   8016c:       9003            str     r0, [sp, #12]
   8016e:       6bd8            ldr     r0, [r3, #60]   ; 0x3c
   80170:       b2c0            uxtb    r0, r0
   80172:       9004            str     r0, [sp, #16]
   80174:       6bd8            ldr     r0, [r3, #60]   ; 0x3c
   80176:       b2c0            uxtb    r0, r0
   80178:       9005            str     r0, [sp, #20]
   8017a:       6bd8            ldr     r0, [r3, #60]   ; 0x3c
   8017c:       b2c0            uxtb    r0, r0
   8017e:       9006            str     r0, [sp, #24]
   80180:       6bd8            ldr     r0, [r3, #60]   ; 0x3c
   80182:       b2c0            uxtb    r0, r0

(This is using the code from Post #2, so it has the extra uxtb instructions to isolate the low 8bits.)

  *a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111;//3ticks

3 cycles for that statement is a very optimistic guess/result, given a slow peripheral bus, wait states on the flash memory (complicated by "flash acceleration"), and who knows what sort of synchronization issues. It's very difficult to predict ARM timing with any certainty :frowning:

Thank for reply.
Frankly I am not experienced in the field of assambly language.
I have tried to test

00080148 <loop>:
   80148:       b510            push    {r4, lr}
   8014a:       f5ad 5dfa       sub.w   sp, sp, #8000   ; 0x1f40
  This function disables IRQ interrupts by setting the I-bit in the CPSR.
  Can only be executed in Privileged modes.
 */
__attribute__( ( always_inline ) ) static __INLINE void __disable_irq(void)
{
  __ASM volatile ("cpsid i");
   8014e:       b672            cpsid   i
   80150:       4b2d            ldr     r3, [pc, #180]  ; (80208 <loop+0xc0>)
   80152:       492e            ldr     r1, [pc, #184]  ; (8020c <loop+0xc4>)
   80154:       688a            ldr     r2, [r1, #8]
   80156:       6bd8            ldr     r0, [r3, #60]   ; 0x3c
   80158:       b2c0            uxtb    r0, r0
   8015a:       9000            str     r0, [sp, #0]
   8015c:       6bd8            ldr     r0, [r3, #60]   ; 0x3c
   8015e:       b2c0            uxtb    r0, r0
   80160:       9001            str     r0, [sp, #4]
   80162:       6bd8            ldr     r0, [r3, #60]   ; 0x3c
   80164:       b2c0            uxtb    r0, r0
   80166:       9002            str     r0, [sp, #8]
   80168:       6bd8            ldr     r0, [r3, #60]   ; 0x3c
   8016a:       b2c0            uxtb    r0, r0
   8016c:       9003            str     r0, [sp, #12]
   8016e:       6bd8            ldr     r0, [r3, #60]   ; 0x3c
   80170:       b2c0            uxtb    r0, r0
   80172:       9004            str     r0, [sp, #16]
   80174:       6bd8            ldr     r0, [r3, #60]   ; 0x3c
   80176:       b2c0            uxtb    r0, r0
   80178:       9005            str     r0, [sp, #20]
   8017a:       6bd8            ldr     r0, [r3, #60]   ; 0x3c
   8017c:       b2c0            uxtb    r0, r0
   8017e:       9006            str     r0, [sp, #24]
   80180:       6bd8            ldr     r0, [r3, #60]   ; 0x3c
   80182:       b2c0            uxtb    r0, r0

with Arduino IDE But I couldn't work.

although I do not know assambly language I want to run reading process as fast as possible.
and this my read values block

void readValues() {
  EthernetClient client = server.available();
  byte X[8000]; 
  uint32_t i, t0, t1,t[8000];
  byte *a = &X[0];
  
  elapsed_Time = 0; 
  noInterrupts();
  //PIO_PDSR&0B00000000000000000000000011111111  ARDUINO DUE  (MSB)D11 D14 D15 D25 D26 D27 D28 D29(LSB) PORTLARININ STATUS REGISTER 

  t0 = SysTick->VAL; //

  // first reading  7clk laters 3clk

  T(T(T(*a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111))) //1) 3008tick //A[0] dan A[1000] e tek satirda atama yapiyor
  T(T(T(*a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111))) //2) 3000tick
  T(T(T(*a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111))) //3) 3000tick //PIO_PDSR= Pin Data Status Register
  T(T(T(*a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111))) //4) 3000tick
  T(T(T(*a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111))) //5) 3000tick
  T(T(T(*a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111))) //6) 3000tick
  T(T(T(*a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111))) //7) 3000tick
  T(T(T(*a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111))) //8) 3000tick



  t1 = SysTick->VAL; //  to-t1=24008ticks
   interrupts();
    a=&X[0];//pointer adresini tekrar başa al
    T(T(T( client.write(*a++))))//1)
    T(T(T( client.write(*a++))))//2)
    T(T(T( client.write(*a++))))//3)
    T(T(T( client.write(*a++))))//4)
    T(T(T( client.write(*a++))))//5)
    T(T(T( client.write(*a++))))//6)
    T(T(T( client.write(*a++))))//7)
    T(T(T( client.write(*a++))))//8)


  Serial.print("n_ticks: ");
  Serial.println( ((t0 < t1) ? 84000 + t0 : t0) - t1 );
  Serial.println();
  
  delay(30);


  
}

My purpose is to send readed values without increased number of ticks.
So how do I insert your assambly solution to my readvalues() block ?

I have tried to test [assembly language re-attached with __ASM]...

You're not understanding me correctly. That assembly code was not "suggested" as a faster solution than your C code, that was the code that your C code actually produces.

  A[Index++] = PIOD->PIO_PDSR;

Becomes:
   80156:       6bd8            ldr     r0, [r3, #60]   ;  load from PIOD->PIO_DSR
   80158:       b2c0            uxtb    r0, r0          ; convert byte to long
   8015a:       9000            str     r0, [sp, #0]    ; store int "A" local array.

That's pretty close to optimal. The compiler is even "unrolling" the "Index++" into increasing static offsets, because that's quicker/smaller/uses less registers, than actually keeping and incrementing a counter or pointer.
But the exact code isn't as important as the other example, which shows:

   80152:       6bd8            ldr     r0, [r3, #60]   ;  load from PIOD->PIO_DSR
   80154:       6bd8            ldr     r0, [r3, #60]   ;  load from PIOD->PIO_DSR
   80156:       6bd8            ldr     r0, [r3, #60]   ;  load from PIOD->PIO_DSR
   80158:       6bd8            ldr     r0, [r3, #60]   ;  load from PIOD->PIO_DSR

Just a set of consecutive reads. The compiler has "figured out" that you were storing the values in an array that was never actually used, and so it just avoids storing them at all. It doesn't create the array either. The only reason that it bothers to read the registers is that they're declared as "volatile", which means that the compiler MUST access them when you say so.
Now, your timing says that each of those "ldr" instructions takes 4 cycles. That's a bit annoying, since supposedly "ARM architecture executes most instructions in a single cycle", but you have flash memory with wait states, and you have a slow "peripheral bus" involved, and you have memory accessed every instruction (which might stall pipelines?), so it's not really that surprising that it apparently takes 4 cycles.
That also means that the 8 cycles per read/store that you're seeing is as fast as you can expect it to go if you actually save the data. You MIGHT get slightly faster by moving the code to RAM, but it's pretty uncertain (fewer wait states, but more contention.)

I understand that complier aware about variable will or will not be used in the below codes.

I wonder that how to find out a specific line of C code of assembly code.
I found a web site which you upload the .elf assembly file then it disassemble the .elf code. It was so long and I couldn't find which lines of assembly belongs which lines of C codes.
is there a program or method to discover this.

Secondly Arduino as a 5M Sample Osciloscope | digibirds side
I tried this codes with uno.
Later I set Adc circuit board and measured 440khz analog signal.
I readed 349khz anyway that showed me uno capable of capture on its digital port one and zero at the speed of 5M which comes from digital outputs of ADC.
The result was about uno takes 1600 samples from its digital port nearly 0.19micro sn. If 0.19 micro second divided by clock cycle(1/16 Mhz ) results is roughly 3.04.
So I thought uno capable of reading a digital in 3 clock cycles.

Now Why Due read a digital port in 8 clock cycles?
I may be wrong about comment of clock cycles but I saw the fft result of the measured signal. So I added its image.

This speed problem faced me that I should have taken Microprocessor lesson at the University.
Nowadays to comment and understand your replies.

(Advanced CPU Designs: Crash Course Computer Science #9 - YouTube) I watch Computer Science from Crash Course It helps me .

Anyway I work to keep progress mcu.