I thought the Due would be faster!

I've been experimenting with 8-bit 16MHz Arduinos for years. I hooked up a Micro and a Mega2560 (consecutively, not at the same time) and programmed them to do nothing but switch a pin on and off just as fast as they could. I hooked an oscilloscope to the pin to determine how fast that was.

It was just a hair under half a million cycles per second.

That's not fast enough for me, so I bought a Due, received it today, and pulled the same stunt with it. According to the oscilloscope, it can produce about 114 kilocycles per second.

Surprising. 500KHz at 16MHz, 100KHz at 84MHz.

Here's the program I'm using.

void setup() {
  pinMode (2, OUTPUT);
}

bool f = false;

void loop() {
  f = !f;
  if (f) {
    digitalWrite (2, LOW);
  }
  else {
    digitalWrite (2, HIGH);
  }
}

Hopefully it's not that digitalWrite() that's slowing things down. I tried doing direct port access like this:

void loop() {
  f = !f;
  if (f) {
    PIOB->PIO_SODR = (1 << 25)
  }
  else {
    PIOB->PIO_CODR = (1 << 25)
  }
}

but I couldn't get it to work; I'm probably screwing it up since this is my first swing at a Due.

Anybody know? Is the Due really that slow, with its 84MHz processor, or am I doing something wrong?

Thanks,
Dan

Did you set the pin to be an output? (That may not be necessary.)

Is there a reason you failed to include setup for the Due version?

I use the DUE for many applications. Based on the Arduino environment my colleagues and me developed also our own hardware for special measurement purposes.
And yes, digitalWrite() is very slow, because there are many instructions to get the function compatible with that of other CPUs.
With direct access you will have very short access time.
But it is necessary to initialize the port, which can be done with pinMode() in setup().
Do not forget that changing the pin once in loop() cannot be faster than the loop() calling sequence. Make an extra loop for changing the pin 10 times in your main loop() and You see the delay of loop() and also the speed of direct pin access.
(Or make a pulse by setting and resetting the pin in loop()).

Also, if you need to toggle a pin faster than that, you may need to go beyond the Arduino macros and e.g., configure a pin to be controlled directly from a timer register. You can toggle the pin pretty close to the CPU clock speed by doing this.

However, the learning curve may be significant. Perhaps explain to us what you're trying to accomplish?

I'm not sure I believe that. I get about 123kHz on an Uno, and only 78kHz on a Mega ADK (which should be the same as a Mega.)

I get 113kHz on a Due, so I confirm the "bad" numbers there.
231kHz if I add semicolons to your direct port access version, and 2.6MHz if I put it inside it's own while loop. Almost 17MHz for a stripped-down loop:

void loop() {
  while (1) {
      PIOB->PIO_SODR = (1 << 25);
      PIOB->PIO_CODR = (1 << 25);
  }
}

2.6MHz is about the same speed as a similarly optimized AVR loop, though. (See Maximum pin toggle speed ), so in a way the Due is still a bit disappointing: 5x faster clock, 32bit CPU... About the same speed. :frowning:

Here are some of the reasons:

  1. ARM CPUs do not have specialized IO instructions like the AVR; all of the peripherals are manipulated as though they were memory. Since ARM only does load and store on memory, and pretty much only via addresses that are in registers, that means that the minimum to store a bit is about three instructions - load the address into one register, load the value into another register, and then do the store. On the birght side, if you're in a tight loop, you don't have to do some of those every time. The 17MHz loop loads the address, loads the bit value, and then it can just do single stores for each change. Also on the bright side, this means that ARMs generally do not have the asymmetry of "this is much faster if your port and bit are constants" that is present in the AVR.
  2. While the "instructions" on the Due ARM are mostly single-cycle, that isn't true of the memory buses. Flash memory is relatively slow - running at 84MHz means it takes 5 "wait states" to access flash memory (but this is complicated by some "flash acceleration" that fetches more than one instruction at a time.) The GPIO peripheral is also on a bus where accesses take more than one clock cycle.
  3. digitalWrite() on most non-AVR platforms suffers because it has to duplicate behavior that is a side-effect on AVRs. Like switching on the internal pullup if you write a HIGH while the pin is in input mode. It also turns off PWM (which also happens on AVRs, so some pins digitalWrite faster than others. Sigh.)
  4. It MIGHT be a feature that digitalWrite() is about the same speed on Due as on AVR. Weird things can happen when you try to drive pins too quickly, and ~100kHz is a pretty safe rate.
  5. No one seems to care much about speeding up digitalWrite (on any of the platforms, actually.) digitalWrite() is embarassingly slow on Due... · Issue #16 · arduino/ArduinoCore-sam · GitHub was submitted a long time ago, complaining that digitalWrite() on Due calls a libsam function that duplicates significant amounts of effort. Fixing this would about double the digitalWrite speed, and everyone seemed to think it was a reasonable idea, but the patch has still not been incorporated or released.
2 Likes

PS: a SAM21 (Arduino Zero, et al.) does about 300kHz with digitalWrite(), and 12MHz with direct port writes. An Adafruit SAMD51 board ("Metro M4", a 120MHz CM4 chip) will do 30MHz: Increasing the speed of execution of the adafruit feather - adafruit industries

A Raspberry Pi Pico is probably the speed demon of the lot. Also 120MHz (not including overclocking), it has a special IO processor that can manipulate pins. Someone was bit-banging Ethernet with it!

1 Like

Yes, I set it to be an output; I just forgot to copy/paste that part.

Still--good catch.

Yeah, maybe I'll try unrolling the loop a few times to see if there's significant latency hiding in the code that calls loop(). Thanks.

547KHz on a Micro or Mega2560, to be exact (to three significant figures), according to my oscilloscope. But of course I was using direct port access (PORTB != (1 << 2)) rather than digitalWrite(). I would have done direct port access on the Due, except the instructions I found didn't seem to work for me: the oscilloscope trace stayed flat.

Thanks a lot for the list of reasons; I know a lot more about the ATMega architecture than I do about the ARM architecture.

I don't actually want to toggle a pin, of course; I just want to see how fast the processor can run code that interacts with the outside.

I want to make a medium-speed (1Mb/s) full-duplex communications link based on lasers. I need enough speed to be able to sample an incoming "square" wave from a laser at least four times per cycle (that is, at >=4MHz) and do a bit of computation as well. Obviously I'll need more CPU power to receive than to send; if I have to use a separate processor for each side, that's fine.

But a processor that can't toggle a pin faster than 114kHz, when that's all it's doing, isn't going to be able to sample a pin 35 times that fast, methinks.

Innnnteresting. Thanks for the pointer!

Ah. You didn't say that the 500kHz number was for the direct port version in your initial message. That's believable it would be faster with its own dedicated while loop instead of letting the loop() function do your looping. There is additional overhead between loop() invocations.

In this case you also need a time base for samples distributed evenly across a full wave. Do you want to poll a clock, use a timer interrupt or how else do you want to accomplish a fixed sample rate? All that stuff adds to the time of a simple port/pin scan.

I can't trust two separate clocks to maintain synchronization for hours or days. I want to use the incoming wave to adjust the local clock on (roughly) each transition, the way they do in the CAN protocol. I'll need bit stuffing and destuffing, but I can do that somewhere other than on the chip that's sampling the laser.

The other option (using two differently-colored lasers, one for data and one for clock, and separating them with color filters) is too exotic for my blood.

Or, approach the concern from a different viewpoint:

https://www.sciencedirect.com/topics/engineering/clock-recovery-circuit

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.