Bad digital IO speed?

I'm trying to interface with a ~1MHz digital signal using a Due, but I'm having trouble finding a way to actually read and write digital IO at that speed. I've seen various posts here and there about the speed of the Due's IO, but nothing really concrete and nothing that's in line with the results I'm getting.

As far as I'm aware, direct manipulation of the IO registers (PIO_PSDR, PIO_ODSR, etc.) is the way to go, but I'm still getting pretty abysmal performance. For example, a very simple test of copying the state of pin 2 to pin 5:

void setup()
{
  pinMode(2, INPUT);
  pinMode(5, OUTPUT);
}

void loop()
{
  REG_PIOC_ODSR = REG_PIOB_PDSR;
}

This can't keep up with even a 250kHz input signal, let alone 1MHz. (Strangely, using interrupts to change the output pin's level on edges of the input pin's level works up to a few tens of kHz more - despite the supposed overhead of ISRs - but still nowhere near 1MHz.) I'm pretty sure even a Uno can do this task!

This seems rather strange to me considering the Due runs at 84MHz, literally hundreds of times faster than the 250kHz signal. If my math is correct then it's taking over 150 clock cycles seemingly just to access 2 registers and jump. Surely this cannot be correct, surely I am missing something? I'm pretty sure I've seen posts with people saying they are getting at least several MHz read speed, which is not even close to what I'm observing.

If anyone has any insight or suggestions on this that'd be great, I'm quite stumped!

There are several ways to read a digital pin. One of them is attachinterrupt(), but this arduino function can detect only low frequencies. Another way to achieve this digital reading is to use directly a PIO Handler, but in that case you need to add the weak attribute before each PIO Handler in winterrupts.c.

To do that, in an arduino IDE window, select FILE>Preferences, click in the URL at the bottom of the window and follow this path:

...package/arduino/hardware/sam/1.6.x/cores/arduino/winterrupts

Copy and paste winterrupts.c into a total blank arduino sketch (no setup(), no loop()), and modify the code as mentionned in the sketch below. Copy the new winterrupts.c version into winterrupts file and save.

In this example sketch, PIO_Handler() is triggered by the output of a Timer Counter at 1 MHz, and in turn blink an LED.

/*******************************************************************/
/*                       Test PIO Interrupts                       */
/*    Hook a jumper between pin 2 (PB25) and pin 24 (PA15)         */
/*    Do the below modifications in winterrupts.c                  */
/*******************************************************************/

// Winterrupts.c has to be modified since attachinterrupts is too slow
// ...package/arduino/hardware/sam/1.6.6/cores/arduino/Winterrupts
/*
  void PIOA_Handler(void) __attribute((weak));  // <***** Add this attribute before each PIO Handler
  void PIOA_Handler(void) {
  uint32_t isr = PIOA->PIO_ISR;
  uint32_t i;
  for (i=0; i<32; i++, isr>>=1) {
    if ((isr & 0x1) == 0)
      continue;
    if (callbacksPioA[i])
      callbacksPioA[i]();
  }
  }

*/
#define INT_MASK   (PIO_PA15)

void setup() {
  pinMode(LED_BUILTIN, OUTPUT);

  pio_setup();
  tc_setup();
}
void loop() {
}
/*****************************************************************/
void tc_setup() {

  PMC->PMC_PCER0 |= PMC_PCER0_PID27;                      // TC0 power ON : Timer Counter 0 channel 0 IS TC0

  PIOB->PIO_PDR = PIO_PDR_P25;                            // Set the GPIO to the peripheral
  PIOB->PIO_ABSR |= PIO_PB25B_TIOA0;

  TC0->TC_CHANNEL[0].TC_CMR = TC_CMR_TCCLKS_TIMER_CLOCK1  // MCK/2, clk on rising edge
                              | TC_CMR_WAVE               // Waveform mode
                              | TC_CMR_WAVSEL_UP_RC       // UP mode with automatic trigger on RC Compare
                              | TC_CMR_ACPA_CLEAR         // Clear TIOA0 on RA compare match
                              | TC_CMR_ACPC_SET;          // Set TIOA0 on RC compare match

  TC0->TC_CHANNEL[0].TC_RC = 42;  //<*********************  Frequency = (Mck/2)/TC_RC  Hz = 1 MHz
  TC0->TC_CHANNEL[0].TC_RA = 21;  //<********************   Any Duty cycle in between 1 and TC_RC

  TC0->TC_CHANNEL[0].TC_CCR = TC_CCR_SWTRG | TC_CCR_CLKEN; // Software trigger TC0 counter and enable

}

/*******************************************************************/
void pio_setup()
{
  PMC->PMC_PCER0 = PMC_PCER0_PID11;   // PIOA power ON

  PIOA->PIO_PER = INT_MASK;         // enable paralel input - output
  PIOA->PIO_PUER = INT_MASK;        // enable light pull up
  PIOA->PIO_IFER = INT_MASK;        // enable glitch filter (1/2 clock cycle glitches discarted)
  PIOA->PIO_AIMER =   INT_MASK;     // The interrupt source is described in PIO_ELSR
  PIOA->PIO_ELSR =  INT_MASK;       // enable low level detection

  PIOA->PIO_IER = INT_MASK;         // enable interrupt trigger from INT_MASK pin
  NVIC_EnableIRQ(PIOA_IRQn);
}

void PIOA_Handler(void)
{
  static uint32_t Count;
  uint32_t status = PIOA->PIO_ISR;

  if (status & INT_MASK)
  {
    if (Count++ == 1000000)
    {
      Count = 0;
      PIOB->PIO_ODSR ^= PIO_ODSR_P27;
      // do something...
    }
  }
}

Or you can read a digital pin without interruption but with a blocking code (advantage : higher frequency detection):

/*******************************************************************/
/*                       Test PIO Reading                          */
/*    Hook a jumper between pin 2 (PB25) and pin 24 (PA15)         */
/*******************************************************************/

#define INT_MASK   (PIO_PA15)

void setup() {
  pinMode(LED_BUILTIN, OUTPUT);
  pio_setup();
  tc_setup();
}
void loop() {
  static uint32_t Count;
  while (true)
  {
    while (!(PIOA->PIO_ISR & INT_MASK));
    if (Count++ == 3000000)
    {
      Count = 0;
      PIOB->PIO_ODSR ^= PIO_ODSR_P27;
    }
  }
}
/*****************************************************************/
void tc_setup() {

  PMC->PMC_PCER0 |= PMC_PCER0_PID27;                      // TC0 power ON : Timer Counter 0 channel 0 IS TC0

  PIOB->PIO_PDR = PIO_PDR_P25;                            // Set the GPIO to the peripheral
  PIOB->PIO_ABSR |= PIO_PB25B_TIOA0;

  TC0->TC_CHANNEL[0].TC_CMR = TC_CMR_TCCLKS_TIMER_CLOCK1  // MCK/2, clk on rising edge
                              | TC_CMR_WAVE               // Waveform mode
                              | TC_CMR_WAVSEL_UP_RC       // UP mode with automatic trigger on RC Compare
                              | TC_CMR_ACPA_CLEAR         // Clear TIOA0 on RA compare match
                              | TC_CMR_ACPC_SET;          // Set TIOA0 on RC compare match

  TC0->TC_CHANNEL[0].TC_RC = 14;  //<*********************  Frequency = (Mck/2)/TC_RC = 3 MHz 
  TC0->TC_CHANNEL[0].TC_RA = 7;  //<********************   Any Duty cycle in between 1 and TC_RC

  TC0->TC_CHANNEL[0].TC_CCR = TC_CCR_SWTRG | TC_CCR_CLKEN; // Software trigger TC0 counter and enable

}

/*******************************************************************/
void pio_setup(void)
{
  PMC->PMC_PCER0 = PMC_PCER0_PID11;   // PIOA power ON

  PIOA->PIO_PER = INT_MASK;         // enable paralel input - output
  PIOA->PIO_PUER = INT_MASK;        // enable light pull up
  PIOA->PIO_IFER = INT_MASK;        // enable glitch filter (1/2 clock cycle glitches discared)
}
void loop()

{
  REG_PIOC_ODSR = REG_PIOB_PDSR;
}

loop() has more overhead than just a jump.

void loop()
{
  while (1) {
    REG_PIOC_ODSR = REG_PIOB_PDSR;
  }
}

Would be much faster.
Not 84MHz-ish fast, though. Running at that speed involves wait states on the program memory and bus delays on the IO bus. Even the tight loop would probably take 10 to 20 cycles...

westfw:
loop() has more overhead than just a jump.

void loop()

{
  while (1) {
    REG_PIOC_ODSR = REG_PIOB_PDSR;
  }
}




Would be much faster.

Oh, wow, thanks, I didn't think of that! With the while loop it's much quicker, seems to work with at least 1MHz. How come loop() is so slow?

main() (which is hidden contains a loop like:

        for (;;) {
           loop();
           if (serialEventRun)
              serialEventRun();
        }

It may not look like much, but it's means that each invocation of loop() is a function call and some accesses to memory, which subsequently prevents some of the optimizations that you can do with a simple loop.
For the while(1) loop, the compiler will probably stick the source and destination IO Port addresses in two registers, and have the expected "read, write, loop" three-instruction loop. When you just have the loop() function, it'll have to load each of those addresses into the registers, plus the call/return overhead, plus the "serialEvent" check...