Arduino Nano Every port manipulation (ATmega4809)

In my project I need to toggle some I/Os in a pretty short time, so I chose the new Arduino Nano Every with the ATmega4809 microcontroller which runs at 20MHz instead of the usual 16MHz.

With my Arduino Uno I was able to toggle an I/O with a frequency of 4MHz, which means it took the Microcontroller only two clock cycles to change the PORTx register. (See ArduinoUno.jpg attachment)

The Arduino IDE has the option to emulate the registers of the ATmega328 when using the Arduino Nano Every, because the ATmega4809 uses different mnemonics for the port registers.

This is the code I used for testing, with "registers emulation" activated and the registers for the ATmega328 used:

void setup() {
  DDRB |= 0x01; // D8 is an output
}

void loop() {
  PORTB |= 0x01; // set D8
  PORTB &= 0xFE; // clear D8
}

With this code, i got a maximum frequency of about 2.7MHz which i expected because the ATmega328 registers had to be translated to the ATmega4809 registers. (see ArduinoEvery.jpg the lower trace)

But when I use the ATmega4809 registers directly with "registers emulation" turned off, I get a even lower frequency of only 2MHz. (see ArduinoEvery.jpg the upper trace)

I got the register names from the megaAVR 0-series data sheet:

To use pin number n as an output only, write bit n of the PORTx.DIR register to '1'. This can be done by writing bit n in
the PORTx.DIRSET register to '1', which will avoid disturbing the configuration of other pins in that group. The nth bit
in the PORTx.OUT register must be written to the desired output value.
Similarly, writing a PORTx.OUTSET bit to '1' will set the corresponding bit in the PORTx.OUT register to '1'. Writing a
bit in PORTx.OUTCLR to '1' will clear that bit in PORTx.OUT to zero. Writing a bit in PORTx.OUTTGL or PORTx.IN to
'1' will toggle that bit in PORTx.OUT.

This is the code with the ATmega4809 registers "registers emulation" turned off and the registers for the ATmega4809 used:

void setup() {
  PORTE.DIRSET = 1<<3;
}

void loop() {
  PORTE.OUTSET = 0x08;
  PORTE.OUTCLR = 0x08;
}

I didn't use the toggle function, because I have to set the pin explicitly to either high or low.

I expected the frequency to be higher or at least the same with the correct registers than with registers emulation.

Can anyone explain why the frequency was even lower with the correct registers?

Admittedly the table here is for using digitalWrite (apart from the STM32F401) instead of hitting the metal direct but they show the Arduino cores do not perform particularly fast.
I'm not genned up on the ATmega4809 but wonder if using PWM hardware (assuming it has any) would be quicker. Another option might be to use an ESP32 or Teensy instead if you cannot get better speed from the ATmega4809.

Wrap the port writes in an infinite while loop, I think you're losing some performance to bookkeeping outside of loop (the giveaway is that it stays high for less time than it stays low).

Also, try VPORTE.OUT instead, the vport registers are at lower addresses such that they can be accessed by certain instructions that are faster than for accessing registers st higher addresses. The PORTE.SET and PORTE.CLEAR will also be faster as they dont require a read and computation step.

Also, while the 4809 can run at 20, I think in the state arduino sells them, they're configured to run at 16...

Can anyone explain why the frequency was even lower with the correct registers?

The port registers on mega4809 are not in the IO space of the AVR, so they end up using relatively slow memory store instructions:

2ea:   c8 e0           ldi     r28, 0x08       ; 8
        for (;;) {
                loop();
                if (serialEventRun) serialEventRun();
 2ec:   00 e0           ldi     r16, 0x00       ; 0
 2ee:   10 e0           ldi     r17, 0x00       ; 0
 2f0:   c0 93 85 04     sts     0x0485, r28     ; PORTE.OUTSET
 2f4:   c0 93 86 04     sts     0x0486, r28     ; PORTE.OUTCLR

For maxium speed, use the VPORT registers:

void loop() {
  VPORTE.OUT |= 0x08;
  VPORTE.OUT &= ~0x08;
}

DrAzzy:
Wrap the port writes in an infinite while loop, I think you're losing some performance to bookkeeping outside of loop (the giveaway is that it stays high for less time than it stays low).

For the Arduino Uno I indeed used a while loop, so only the positive pulse width is relevant, which is the same for the Uno and the Every without register emulation.

westfw:
For maxium speed, use the VPORT registers:

void loop() {

VPORTE.OUT |= 0x08;
  VPORTE.OUT &= ~0x08;
}

Using the VPORT registers results in the same positive pulse width as using register emulation, so the translation must happen while compiling.

Thank you all for the help

Perhaps you should show us the code you used to get 4MHz out of your Uno. Your waveform picture looks much more symetric than the 4MHz code from: Maximum pin toggle speed - Frequently-Asked Questions - Arduino Forum (which also showed 2.7MHz for the "obvious" code.)
Probably the fastest you can go is:

while (1) {
  VPORTE.OUT = 0xFF;
  VPORTE.OUT = 0;
}

It does seem a shame that it doesn't look like the compiler can be coerced into loading up an index register with the PORT_t value:

    ldi zl, lo8(PORTE)
    ldi zh, hi8(PORTE)
    ldi r16, 0x80
0:  std  z+5, r16 ; SET
    std z+6, r16  ; CLR
    rjmp 0b

It might happen "normally" if you derive the port info from the pin table...

westfw:
Perhaps you should show us the code you used to get 4MHz out of your Uno.

To be honest, I cheated a little bit. The only way I could get a symmetrical 4MHz out of the Uno was to toggle the port without a loop, so line by line. I know this is very bad code, but it's the only way.

void setup() {
  DDRB |= 0x01; // D8 is an output
}

void loop() {
  // the only way to get a 4MHz symmetrical square wave
  PORTB |= 0x01; // set D8
  PORTB &= 0xFE; // clear D8
  PORTB |= 0x01; // set D8
  PORTB &= 0xFE; // clear D8
  PORTB |= 0x01; // set D8
  PORTB &= 0xFE; // clear D8
  PORTB |= 0x01; // set D8
  PORTB &= 0xFE; // clear D8
  PORTB |= 0x01; // set D8
  PORTB &= 0xFE; // clear D8
  PORTB |= 0x01; // set D8
  PORTB &= 0xFE; // clear D8
  PORTB |= 0x01; // set D8
  PORTB &= 0xFE; // clear D8
  PORTB |= 0x01; // set D8
  PORTB &= 0xFE; // clear D8
  PORTB |= 0x01; // set D8
  PORTB &= 0xFE; // clear D8
  ...
}

Please note that even if you enclose this code in a while(1) loop, you get a short delay in between the loops.

Also note that the Nano Every runs at 16MHz by default (despite what is printed on the box and in the specs). The clock speed can be changed to 20MHz by modifying the boards.txt file.