>:(
I have run SystTick code before and later 8000 times port reading.
Number of ticks for 8000 reading is 24008ticks.(one tick 11.9ns) which means 28M reading per second.
After finishing Readingvalue process
I have added a for_loop for sending values of array .
But this time number ticks increases approximately 80 000ticks.
Why number of systick increase ?
Code line of client.write is under reading process!
But it effects clock cycles of readings.
I don't understand this behaviour of MCU.
and decrease the speed of reading from28M to 10.5M reading per second.
EthernetClient client = server.available();
uint32_t X[8000]; // X serisini kütüphanelerin tanımlarının yapıldığı genel değişkenler kısmına yapınca bir okuma işlemi 24clk alıyor. Burada tanımlanınca 3clk oluyor.:)
uint32_t i,t0,t1,*a=&X[0];
elapsed_Time = 0; // for a girmeden önce geçen süre sıfırla
noInterrupts();
// PIO_PDSR&0xFF =Digital ports of ARDUINO DUE :(MSB)D11 D14 D15 D25 D26 D27 D28 D29(LSB)
t0=SysTick->VAL;
T(T(T(*a++ = PIOD->PIO_PDSR&0XFF)))// 1000 times readig
t1=SysTick->VAL; // to-t1=24008ticks without T(T(T( client.write(*a++))))
a=&X[0];//pointer takes the address of first member of array
T(T(T( client.write(*a++))))// with sending number of ticks approximately 80000 >:(
interrupts();
Serial.print("n_ticks: ");
Serial.println( ((t0<t1)?84000+t0:t0)-t1 );
Serial.println();
delay(30);
By default, the SysTick Timer Counter counts down from 84000 to 0 in 1 ms, then reloads and so on.
I bet that your readings take more than 1 ms and SysTick can't handle more than 1 ms (by default).
Anyway, the right process to read Systick -> VAL should be this one:
volatile uint32_t t0, t1, t3;
void setup() {
Serial.begin(250000);
}
void loop() {
noInterrupts();
t0 = SysTick->VAL;
// Do some stuff during less than 1 ms, e.g. :
delayMicroseconds(5);
t1 = SysTick->VAL;
t3 = ((t0 < t1) ? 84000 + t0 : t0) - t1 - 2 ;
interrupts();
Serial.print("number of ticks: ");
Serial.println(t3 );
delay(1000);
}
Thanks
I don't evaluate SysTick in terms of 1ms.
I have tested SysTick for 15 times reading.
Result was sense :
first reading 5tick +14*3tick=47ticks
and elapsed time for reading is small enough for 1ms.
I added serial.println after finished reading.
The same reading process increase 121tick.??
volatile uint32_t t0, t1, t3;
void setup() {
Serial.begin(250000);
}
void loop() {
uint32_t i,A[2000],t0,t1,*a=&A[0];
noInterrupts();
t0 = SysTick->VAL;
// Do some stuff during less than 1 ms, e.g. :
*a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111;//5ticks
*a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111;//3ticks
*a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111;//3ticks
*a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111;//3ticks
*a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111;//3ticks
*a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111;//3ticks
*a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111;//3ticks
*a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111;//3ticks
*a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111;//3ticks
*a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111;//3ticks
*a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111;//3ticks
*a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111;//3ticks
*a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111;//3ticks
*a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111;//3ticks
*a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111;//3ticks
//total=5+3*14=47ticks
t1 = SysTick->VAL;
t3 = ((t0 < t1) ? 84000 + t0 : t0) - t1 - 2 ;
interrupts();
for(i=0;i<14;i++){Serial.println(A[i] );}
Serial.print("number of ticks: ");
Serial.println(t3 );
delay(1000);
}
This issue is the result of the compiler optimization, you could see that in the assembler file of your sketch.
A workaround to have no difference with or without the Serial printing of the final array would be to write an assembler routine to log PIOD->PIO_PDSR; or use a DMA list link item from &PIOD->PIO_PDSR to an array of uint32_t. The second solution is slower if you log only a few values.
With the below sketch the result is 47 or 91, whether you Serial print or not the final array:
void setup() {
Serial.begin(250000);
}
void loop() {
uint32_t t0, t1, t3;
uint32_t A[2000];
uint8_t Index;
Index = 0;
noInterrupts();
t0 = SysTick->VAL;
A[Index++] = PIOD->PIO_PDSR;
A[Index++] = PIOD->PIO_PDSR;
A[Index++] = PIOD->PIO_PDSR;
A[Index++] = PIOD->PIO_PDSR;
A[Index++] = PIOD->PIO_PDSR;
A[Index++] = PIOD->PIO_PDSR;
A[Index++] = PIOD->PIO_PDSR;
A[Index++] = PIOD->PIO_PDSR;
A[Index++] = PIOD->PIO_PDSR;
A[Index++] = PIOD->PIO_PDSR;
A[Index++] = PIOD->PIO_PDSR;
A[Index++] = PIOD->PIO_PDSR;
A[Index++] = PIOD->PIO_PDSR;
A[Index++] = PIOD->PIO_PDSR;
A[Index++] = PIOD->PIO_PDSR;
t1 = SysTick->VAL;
t3 = ((t0 < t1) ? 84000 + t0 : t0) - t1 - 2 ;
interrupts();
Serial.print("number of ticks: ");
Serial.println(t3);
for (int i = 0; i < 15; i++) {
Serial.println((uint8_t)A[i]);
}
delay(1000);
}
I assigned readed values to a byte array .
I printed new byte array of readed values .
But result is increased number of ticks. :(
This issue is the result of the compiler optimization, you could see that in the assembler file of your sketch.
As newbie said, this is the compiler "optimizing" your code. When you do not actually use the contents of the array, the compiler notices, and does not bother storing the data read from the port (it still has to READ from the port, as required by the "volatile" definitions somewhere deep inside the CMSIS .h files.):
00080148 <loop>:
80148: b508 push {r3, lr}
This function disables IRQ interrupts by setting the I-bit in the CPSR.
Can only be executed in Privileged modes.
*/
__attribute__( ( always_inline ) ) static __INLINE void __disable_irq(void)
{
__ASM volatile ("cpsid i");
8014a: b672 cpsid i
8014c: 4b18 ldr r3, [pc, #96] ; (801b0 <loop+0x68>)
8014e: 4919 ldr r1, [pc, #100] ; (801b4 <loop+0x6c>)
80150: 688a ldr r2, [r1, #8]
80152: 6bd8 ldr r0, [r3, #60] ; 0x3c
80154: 6bd8 ldr r0, [r3, #60] ; 0x3c
80156: 6bd8 ldr r0, [r3, #60] ; 0x3c
80158: 6bd8 ldr r0, [r3, #60] ; 0x3c
8015a: 6bd8 ldr r0, [r3, #60] ; 0x3c
8015c: 6bd8 ldr r0, [r3, #60] ; 0x3c
8015e: 6bd8 ldr r0, [r3, #60] ; 0x3c
80160: 6bd8 ldr r0, [r3, #60] ; 0x3c
80162: 6bd8 ldr r0, [r3, #60] ; 0x3c
80164: 6bd8 ldr r0, [r3, #60] ; 0x3c
80166: 6bd8 ldr r0, [r3, #60] ; 0x3c
80168: 6bd8 ldr r0, [r3, #60] ; 0x3c
8016a: 6bd8 ldr r0, [r3, #60] ; 0x3c
8016c: 6bd8 ldr r0, [r3, #60] ; 0x3c
8016e: 6bdb ldr r3, [r3, #60] ; 0x3c
80170: 6889 ldr r1, [r1, #8]
80172: 428a cmp r2, r1
80174: bf34 ite cc
When you print the array, it can't optimize any more:
00080148 <loop>:
80148: b510 push {r4, lr}
8014a: f5ad 5dfa sub.w sp, sp, #8000 ; 0x1f40
This function disables IRQ interrupts by setting the I-bit in the CPSR.
Can only be executed in Privileged modes.
*/
__attribute__( ( always_inline ) ) static __INLINE void __disable_irq(void)
{
__ASM volatile ("cpsid i");
8014e: b672 cpsid i
80150: 4b2d ldr r3, [pc, #180] ; (80208 <loop+0xc0>)
80152: 492e ldr r1, [pc, #184] ; (8020c <loop+0xc4>)
80154: 688a ldr r2, [r1, #8]
80156: 6bd8 ldr r0, [r3, #60] ; 0x3c
80158: b2c0 uxtb r0, r0
8015a: 9000 str r0, [sp, #0]
8015c: 6bd8 ldr r0, [r3, #60] ; 0x3c
8015e: b2c0 uxtb r0, r0
80160: 9001 str r0, [sp, #4]
80162: 6bd8 ldr r0, [r3, #60] ; 0x3c
80164: b2c0 uxtb r0, r0
80166: 9002 str r0, [sp, #8]
80168: 6bd8 ldr r0, [r3, #60] ; 0x3c
8016a: b2c0 uxtb r0, r0
8016c: 9003 str r0, [sp, #12]
8016e: 6bd8 ldr r0, [r3, #60] ; 0x3c
80170: b2c0 uxtb r0, r0
80172: 9004 str r0, [sp, #16]
80174: 6bd8 ldr r0, [r3, #60] ; 0x3c
80176: b2c0 uxtb r0, r0
80178: 9005 str r0, [sp, #20]
8017a: 6bd8 ldr r0, [r3, #60] ; 0x3c
8017c: b2c0 uxtb r0, r0
8017e: 9006 str r0, [sp, #24]
80180: 6bd8 ldr r0, [r3, #60] ; 0x3c
80182: b2c0 uxtb r0, r0
(This is using the code from Post #2 (http://forum.arduino.cc/index.php?topic=511643.msg3487506#msg3487506), so it has the extra uxtb instructions to isolate the low 8bits.)
*a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111;//3ticks
3 cycles for that statement is a very optimistic guess/result, given a slow peripheral bus, wait states on the flash memory (complicated by "flash acceleration"), and who knows what sort of synchronization issues. It's very difficult to predict ARM timing with any certainty :-(
Thank for reply.
Frankly I am not experienced in the field of assambly language.
I have tried to test
00080148 <loop>:
80148: b510 push {r4, lr}
8014a: f5ad 5dfa sub.w sp, sp, #8000 ; 0x1f40
This function disables IRQ interrupts by setting the I-bit in the CPSR.
Can only be executed in Privileged modes.
*/
__attribute__( ( always_inline ) ) static __INLINE void __disable_irq(void)
{
__ASM volatile ("cpsid i");
8014e: b672 cpsid i
80150: 4b2d ldr r3, [pc, #180] ; (80208 <loop+0xc0>)
80152: 492e ldr r1, [pc, #184] ; (8020c <loop+0xc4>)
80154: 688a ldr r2, [r1, #8]
80156: 6bd8 ldr r0, [r3, #60] ; 0x3c
80158: b2c0 uxtb r0, r0
8015a: 9000 str r0, [sp, #0]
8015c: 6bd8 ldr r0, [r3, #60] ; 0x3c
8015e: b2c0 uxtb r0, r0
80160: 9001 str r0, [sp, #4]
80162: 6bd8 ldr r0, [r3, #60] ; 0x3c
80164: b2c0 uxtb r0, r0
80166: 9002 str r0, [sp, #8]
80168: 6bd8 ldr r0, [r3, #60] ; 0x3c
8016a: b2c0 uxtb r0, r0
8016c: 9003 str r0, [sp, #12]
8016e: 6bd8 ldr r0, [r3, #60] ; 0x3c
80170: b2c0 uxtb r0, r0
80172: 9004 str r0, [sp, #16]
80174: 6bd8 ldr r0, [r3, #60] ; 0x3c
80176: b2c0 uxtb r0, r0
80178: 9005 str r0, [sp, #20]
8017a: 6bd8 ldr r0, [r3, #60] ; 0x3c
8017c: b2c0 uxtb r0, r0
8017e: 9006 str r0, [sp, #24]
80180: 6bd8 ldr r0, [r3, #60] ; 0x3c
80182: b2c0 uxtb r0, r0
with Arduino IDE But I couldn't work.
although I do not know assambly language I want to run reading process as fast as possible.
and this my read values block
void readValues() {
EthernetClient client = server.available();
byte X[8000];
uint32_t i, t0, t1,t[8000];
byte *a = &X[0];
elapsed_Time = 0;
noInterrupts();
//PIO_PDSR&0B00000000000000000000000011111111 ARDUINO DUE (MSB)D11 D14 D15 D25 D26 D27 D28 D29(LSB) PORTLARININ STATUS REGISTER
t0 = SysTick->VAL; //
// first reading 7clk laters 3clk
T(T(T(*a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111))) //1) 3008tick //A[0] dan A[1000] e tek satirda atama yapiyor
T(T(T(*a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111))) //2) 3000tick
T(T(T(*a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111))) //3) 3000tick //PIO_PDSR= Pin Data Status Register
T(T(T(*a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111))) //4) 3000tick
T(T(T(*a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111))) //5) 3000tick
T(T(T(*a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111))) //6) 3000tick
T(T(T(*a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111))) //7) 3000tick
T(T(T(*a++ = PIOD->PIO_PDSR & 0B00000000000000000000000011111111))) //8) 3000tick
t1 = SysTick->VAL; // to-t1=24008ticks
interrupts();
a=&X[0];//pointer adresini tekrar başa al
T(T(T( client.write(*a++))))//1)
T(T(T( client.write(*a++))))//2)
T(T(T( client.write(*a++))))//3)
T(T(T( client.write(*a++))))//4)
T(T(T( client.write(*a++))))//5)
T(T(T( client.write(*a++))))//6)
T(T(T( client.write(*a++))))//7)
T(T(T( client.write(*a++))))//8)
Serial.print("n_ticks: ");
Serial.println( ((t0 < t1) ? 84000 + t0 : t0) - t1 );
Serial.println();
delay(30);
}
My purpose is to send readed values without increased number of ticks.
So how do I insert your assambly solution to my readvalues() block ?
I have tried to test [assembly language re-attached with __ASM]...
You're not understanding me correctly. That assembly code was not "suggested" as a faster solution than your C code, that was the code that your C code actually
produces. A[Index++] = PIOD->PIO_PDSR;
Becomes:
80156: 6bd8 ldr r0, [r3, #60] ; load from PIOD->PIO_DSR
80158: b2c0 uxtb r0, r0 ; convert byte to long
8015a: 9000 str r0, [sp, #0] ; store int "A" local array.
That's pretty close to optimal. The compiler is even "unrolling" the "Index++" into increasing static offsets, because that's quicker/smaller/uses less registers, than actually keeping and incrementing a counter or pointer.
But the exact code isn't as important as the other example, which shows:
80152: 6bd8 ldr r0, [r3, #60] ; load from PIOD->PIO_DSR
80154: 6bd8 ldr r0, [r3, #60] ; load from PIOD->PIO_DSR
80156: 6bd8 ldr r0, [r3, #60] ; load from PIOD->PIO_DSR
80158: 6bd8 ldr r0, [r3, #60] ; load from PIOD->PIO_DSR
Just a set of consecutive reads. The compiler has "figured out" that you were storing the values in an array that was never actually used, and so it just avoids storing them at all. It doesn't create the array either. The only reason that it bothers to read the registers is that they're declared as "volatile", which means that the compiler MUST access them when you say so.
Now, your timing says that each of those "ldr" instructions takes 4 cycles. That's a bit annoying, since supposedly "ARM architecture executes most instructions in a single cycle", but you have flash memory with wait states, and you have a slow "peripheral bus" involved, and you have memory accessed every instruction (which might stall pipelines?), so it's not really that surprising that it apparently takes 4 cycles.
That also means that the 8 cycles per read/store that you're seeing is
as fast as you can expect it to go if you actually save the data. You MIGHT get slightly faster by moving the code to RAM, but it's pretty uncertain (fewer wait states, but more contention.)
I understand that complier aware about variable will or will not be used in the below codes.
I wonder that how to find out a specific line of C code of assembly code.
I found a web site which you upload the .elf assembly file then it disassemble the .elf code. It was so long and I couldn't find which lines of assembly belongs which lines of C codes.
is there a program or method to discover this.
Secondly https://digibird1.wordpress.com/arduino-as-a-5m-sample-osciloscope/
I tried this codes with uno.
Later I set Adc circuit board and measured 440khz analog signal.
I readed 349khz anyway that showed me uno capable of capture on its digital port one and zero at the speed of 5M which comes from digital outputs of ADC.
The result was about uno takes 1600 samples from its digital port nearly 0.19micro sn. If 0.19 micro second divided by clock cycle(1/16 Mhz ) results is roughly 3.04.
So I thought uno capable of reading a digital in 3 clock cycles.
Now Why Due read a digital port in 8 clock cycles?
I may be wrong about comment of clock cycles but I saw the fft result of the measured signal. So I added its image.
This speed problem faced me that I should have taken Microprocessor lesson at the University.
Nowadays to comment and understand your replies.
(https://youtu.be/rtAlC5J1U40) I watch Computer Science from Crash Course It helps me .
Anyway I work to keep progress mcu.