long delay between external interrput edge and ISR execution

Hi everyone.

I am using Arduino Due.

1- I configure TIOA0 to generate a 3.5MHz pulse.

2- I configure digital pin 12 to capture an external event by interrupting the SAM3X.
( Here, the external event is the 3.5MHz pulse on TIOA0)
( In fact, the Due receives an external 3.5MHz from a peripheral, but because the complete code for configuring that peripheral is very long, i simply generate it using TIOA0)

3- In every rising edge on digital pin 12, the PD.0 is switched on/off in the ISR.

my code:

void setup() {
  
  /////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
  // Configure PD.0 as output
  /////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
  REG_PIOD_PER = PIO_PER_P0;
  REG_PIOD_OER = PIO_OER_P0;

  
  /////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
  // Configure TC0, channel A (TIOA0) to generate a 3.5MHz output.
  /////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
  
  REG_PMC_PCER0 = PMC_PCER0_PID27  ;            // Enable the clock of TC0 (Peripheral with ID 27)
  REG_PIOB_PDR  = PIO_PDR_P25;                  // Disable the PIO from controlling the pin PB.25; and enables peripheral control of the pin (TC0)
  REG_PIOB_ABSR = PIO_PB25B_TIOA0;              // Assign the I/O line PB25 to peripheral B function: TIOA0
  REG_TC0_CMR0  =   TC_CMR_TCCLKS_TIMER_CLOCK1  // MCK/2 clock is selected for TC0 operation (42MHz)
                  | TC_CMR_WAVE                 // Waveform mode
                  | TC_CMR_WAVSEL_UP_RC         // UP mode with automatic trigger on RC compare
                  | TC_CMR_ACPA_CLEAR           // Clear TIOA0 on RA0 compare match
                  | TC_CMR_ACPC_SET;            // Set TIOA0 on RC0 compare match
  REG_TC0_RA0   = 6;                            // Duty Cycle = (RA0 / RC0)  = 50%
  REG_TC0_RC0   = 12;                           // Frequency = (42MHz / RC0) = 3.5MHz
  REG_TC0_CCR0  = TC_CCR_SWTRG | TC_CCR_CLKEN;  // A software trigger is performed, and the clock of TC0 is enabled.


  /////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
  // Configure Digital Pin 12 as an interrupt input pin, to detect every rising edge of the 3.5MHz signal.
  /////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
  pinMode(12,INPUT_PULLUP);
  attachInterrupt(digitalPinToInterrupt(12), myISR, RISING);

  
 
}

void myISR() {

  REG_PIOD_SODR = PIO_SODR_P0; // Write One to PD.0
  REG_PIOD_SODR = PIO_SODR_P0;

  REG_PIOD_CODR = PIO_CODR_P0;// Write Zero to PD.0
 
}



void loop() {


}

The problem is that in measurement, PD.0 is switched on/off every 5 pulses of the 3.5MHz input.
When I reduce the frequency of the TIOA0 to 350KHz, the problem is solved, and there is about 850ns delay in between the rising edge of the external event and the narrow pulse on PD.0

Based on information about Cortex-M3, it takes about 12 cycles to save-jump-return involved in any ISR execution, which is about 100ns in Due.
The three commands in the ISR each take 2 clock cycles, an overall required time of about 72ns.

So, the program should theoretically be able to capture a 3.5MHz (285ns) external event.

Please help me to solve my problem.

While the guys are thinking, I tried another way to capture the external interrupt.
I use a PIO pin (PD.8 ) as input, and configure it to detect rising edges.
I use PIOD_Handler() function to manipulate what happens in ISR, and because it has a conflict with the WInterrupts.c file in the AppData\Local\Arduino15\packages\arduino\hardware\sam\1.6.12\cores\arduino directory, I comment the PIOD_Handler function in that file.

The result got better, the delay is lowered to 570ns, and in this way, I am able to capture a maximum frequency of 1.75MHz input signal, yet not reaching my project requirement.

my code:

volatile unsigned int isr;


void setup() {
  
  /////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
  // Configure PD.0 as output
  /////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
  REG_PIOD_PER = PIO_PER_P0;
  REG_PIOD_OER = PIO_OER_P0;

  /////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
  // Configure TC0, channel A (TIOA0) to generate a 3.5MHz output.
  /////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
  
  REG_PMC_PCER0 = PMC_PCER0_PID27  ;            // Enable the clock of TC0 (Peripheral with ID 27)
  REG_PIOB_PDR  = PIO_PDR_P25;                  // Disable the PIO from controlling the pin PB.25; and enables peripheral control of the pin (TC0)
  REG_PIOB_ABSR = PIO_PB25B_TIOA0;              // Assign the I/O line PB25 to peripheral B function: TIOA0
  REG_TC0_CMR0  =   TC_CMR_TCCLKS_TIMER_CLOCK1  // MCK/2 clock is selected for TC0 operation (42MHz)
                  | TC_CMR_WAVE                 // Waveform mode
                  | TC_CMR_WAVSEL_UP_RC         // UP mode with automatic trigger on RC compare
                  | TC_CMR_ACPA_CLEAR           // Clear TIOA0 on RA0 compare match
                  | TC_CMR_ACPC_SET;            // Set TIOA0 on RC0 compare match
  REG_TC0_RA0   = 6;                            // Duty Cycle = (RA0 / RC0)  = 50%
  REG_TC0_RC0   = 12;                           // Frequency = (42MHz / RC0) = 3.5MHz
  REG_TC0_CCR0  = TC_CCR_SWTRG | TC_CCR_CLKEN;  // A software trigger is performed, and the clock of TC0 is enabled.

  /////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
  // Configure PD.8 as an interrupt input pin, to detect every rising edge of the 3.5MHz signal.
  /////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
  REG_PIOD_PER    = PIO_PER_P8;
  REG_PIOD_ODR    = PIO_ODR_P8;
  REG_PIOD_IER    = PIO_IER_P8;

  REG_PIOD_PUER   = PIO_PUER_P8;
  REG_PIOD_AIMER  = PIO_AIMER_P8;
  REG_PIOD_ESR    = PIO_ESR_P8;
  REG_PIOD_REHLSR = PIO_REHLSR_P8;

  REG_PMC_PCER0   = PMC_PCER0_PID14;
  NVIC_EnableIRQ(PIOD_IRQn);
 
}

void PIOD_Handler(){

  
  isr = REG_PIOD_ISR;
  
  REG_PIOD_SODR = PIO_SODR_P0; // Write One to PD.0
  REG_PIOD_CODR = PIO_CODR_P0;// Write Zero to PD.0
  
}

void loop() {
  
}

Based on my knowledge, and information about Cortex-M3, it takes about 12 cycles to save-jump-return involved in any ISR execution, which is about 100ns in Due.

(at least) 12 Cycles to enter the interrupt (saving PC and registers), and at least another 12 cycles to exit again restoring...) That's for the CM3 NVIC interrupt processing. Arduino's "attachinterrupt" (which dispatches pin interrupts based on which pin actually changes) will add additional overhead (I think you got rid of that in your second example.)

The three commands in the ISR each take 2 clock cycles, an overall required time of about 72ns.

Those "commands" will NOT compile to single ARM instructions, and since you're executing from flash memory, each instruction will probably have some wait-states associated with (hard to predict exactly, because of assorted "acceleration" capabilities that may or may not come into play.)

00080148 <myISR()>:
   80148:       4a02            ldr     r2, [pc, #8]    ; (80154 <myISR()+0xc>)
   8014a:       2301            movs    r3, #1
   8014c:       6013            str     r3, [r2, #0]
   8014e:       6013            str     r3, [r2, #0]
   80150:       6053            str     r3, [r2, #4]
   80152:       4770            bx      lr
   80154:       400e1430        .word   0x400e1430

To detect rising edges on a PIO with the maximum frequency, don't use an interrupt but a blocking code in loop() to test PIO_SR.

westfw:
(at least) 12 Cycles to enter the interrupt (saving PC and registers), and at least another 12 cycles to exit again restoring...) That's for the CM3 NVIC interrupt processing. Arduino's "attachinterrupt" (which dispatches pin interrupts based on which pin actually changes) will add additional overhead (I think you got rid of that in your second example.)

Those "commands" will NOT compile to single ARM instructions, and since you're executing from flash memory, each instruction will probably have some wait-states associated with (hard to predict exactly, because of assorted "acceleration" capabilities that may or may not come into play.)

00080148 <myISR()>:

80148:       4a02            ldr     r2, [pc, #8]    ; (80154 <myISR()+0xc>)
  8014a:       2301            movs    r3, #1
  8014c:       6013            str     r3, [r2, #0]
  8014e:       6013            str     r3, [r2, #0]
  80150:       6053            str     r3, [r2, #4]
  80152:       4770            bx      lr
  80154:       400e1430        .word   0x400e1430

Thanks.

As you said, it takes (atleast) 24 cycles to execute an ISR.

About acceleration: I was surprised in a previous experience with ARM, where ten or twenty instructions (which are expected to take at least 10 or 20 cycles) were executed in a single cycle. But I didn't know anything about acceleration capability of ARM processors.

Thank you so much for your exact and guideful reply.

ard_newbie:
To detect rising edges on a PIO with the maximum frequency, don't use an interrupt but a blocking code in loop() to test PIO_SR.

Since my first post in this thread, I examined the way you mentioned: polling the registers. (Probably you mean PIO_ISR; not PIO_SR)

I use both PIO_PDSR and PIO_ISR registers in two different codes to detect the rising edge; The code with PIO_ISR results in the best performance in detecting the rising edge, But there was a little improvement in speed, not for example a factor of 2 improvement. The maximum frequency is enhanced from 1.75MHz to 2.34MHz.

Thanks for your reply and good suggestion.

I was surprised in a previous experience with ARM, where ten or twenty instructions (which are expected to take at least 10 or 20 cycles) were executed in a single cycle. But I didn't know anything about acceleration capability of ARM processors.

Some processors are capable of executing "several" instructions in parallel (even on a single core, and then they have multiple cores), but not the Cortex-M processors used in most Arduino-class boards. (10 or 20 instructions per cycle seems unlikely, though.)

Maybe another improvement can be achievied by copying continuously PIO_PDSR with an AHB DMA and a linked list item into an SRAM variable, then test in loop() this variable with a blocking code. The DMA process won't use core cycles, it's a parallel process from the core processor.

Thanks westfw and ard_newbie for tips and useful information.

I will study about AHB DMA and copying PIO register into SRAM, and will describe the result here.

During my tests, I found out that the loop() structure in Arduino IDE does not behave like a simple while(1) loop. If i measured correctly, it takes about 1us to jump from the end of the loop() to its beginning.
But when I use a while(1) structure within the loop() and place my code in while(1); the time is substantially reduced, about 3 or 4 processor cycles (I am not sure...).

Is this observation about loop() correct?

Yes it is :slight_smile:

For AHB DMA implementation, you may see this thread:

https://forum.arduino.cc/index.php?topic=564007.0