digitalWriteFast with UNO R4

Hi, I just got this new board and I wish to run some speed test.
In details, I want to try the fast bit/port manipulation but i'm confused by a totally different port names on this module.
I was used to see PORTA..PORTD names, but here I have something like P0xx..P3xx
UNOr4_P1xx_pinout

P0 or P1 are not recognized as ports, nor P105 (D2) in example
How can I refer to those pins ?
I'm going to try a digitalWriteFast but is needs the port name constant.

Not compatible yet I don't think.

Delve down into the Arduino core files if possible.

There is no Arduino Uno R4. It is either the "UNO R4 Minima" or the "UNO R4 WiFi" :nerd_face:

The digitalWrite() is here: https://github.com/arduino/ArduinoCore-renesas/blob/main/cores/arduino/digital.cpp#L21

void digitalWrite(pin_size_t pin, PinStatus val) {
  R_IOPORT_PinWrite(NULL, g_pin_cfg[pin].pin, 
  val == LOW ? BSP_IO_LEVEL_LOW : BSP_IO_LEVEL_HIGH);
}

It uses the R_IOPORT_PinWrite() function which is provided by the Renesas Flexible Software Package: https://renesas.github.io/fsp/group___i_o_p_o_r_t.html#ga1b17ca2e38acde207881e7e4ba2a7e28
The source code for that function should be somewhere, but I can not find it.

1 Like

Thanks for your answer Koepel.
My board is a WiFi model; I though it was not important to specify it; I assume that the pin definitions are the same fo both models.
I gave a look to the linked documente, but sadly I believe it is something above my knowledge

R_IOPORT_PinWrite(NULL, g_pin_cfg[pin].pin,...

I do not understand what the library expects from me; I mean:
What is g_pin_cfg[pin].pin
If it is a pin number, how is it defined (i.e. P105 or D2) ?

That is what Renesas made for their processor That part is actually by Arduino

'g_pin_cfg' is an array of struct. You'll have to dig into variant.h and variant.cpp to see how it's layed out.

to do that you will need to:

  1. Understand how the ARM architecture in general implements GPIO.
  2. Study the chip reference manual, to figure out the specifics of this particular Renesas chip, including the "register names" associated with GPIO.
  3. Study the existing core code, to figure out how it does things, and how it could be sped up.
  4. Study the Renesas "fsp" library, to see how IT works, since the Arduino core mostly calls it to do the work.
  5. Study the variants/pins_arduino.h to see how pin mapping works, and come up with an alternative for the "fast" version.

(There's a reason that digitalWrite() is such a brilliant abstraction/function, in spite of being generally slow!)

In general, an ARM chip doesn't allow for as much optimization as an AVR - there are no special instructions that can change a GPIO bit in one instruction (it usually takes 4.) And the penalties for the extra steps taken by Arduino (eg allowing a variable pin and value) are lower. This means that you can't expect as much of a speedup as the AVR 'fast" code provides. You could probably get 3-4x faster, but not 10-20x...

Ok, now everything is clear...I made a wrong buy or, at least, I will stay with the 'slow' digitalWrite.
Thanks anyway to all

I was wondering about some of this stuff and was wondering if some of this was yet available.
On two different fronts. How to speed up the IO, and how compatible is the R4 with previous UNO's?

Note: I currently have the WIFI version have a minima on order...

My first experiment with this, was to see if you could do something like:

  uint32_t *port = digitalPinToPort(LED_BUILTIN);
  uint32_t mask = digitalPinToBitMask(LED_BUILTIN);

  // turn on
  *port |= mask; 

  *port &= ~mask;

Which does not compile.

In file included from C:\Users\kurte\AppData\Local\Temp\arduino\sketches\DECE862A8C38F65BC2E4A75893A1637E\sketch\zzz.ino.cpp:1:0:
C:\Users\kurte\Documents\Arduino\zzz\zzz.ino: In function 'void setup()':
C:\Users\kurte\AppData\Local\Arduino15\packages\arduino\hardware\renesas_uno\1.0.1\cores\arduino/Arduino.h:74:59: error: invalid conversion from 'int' to 'uint32_t* {aka long unsigned int*}' [-fpermissive]
 #define digitalPinToPort(P)        (digitalPinToBspPin(P) >> 8)
                                    ~~~~~~~~~~~~~~~~~~~~~~~^~~~~
C:\Users\kurte\Documents\Arduino\zzz\zzz.ino:16:20: note: in expansion of macro 'digitalPinToPort'
   uint32_t *port = digitalPinToPort(LED_BUILTIN);
                    ^~~~~~~~~~~~~~~~

Which was not overly unexpected. As on some boards port sizes are 32 bits, others 16 or 8...
but not sure what int. means here..

So assumption this won't work...

I am also trying to understand their documentation as well. I am used to several other ARM based boards, but it has been a long time since I have done anything with a Renesas board.

Is there a decent document or header file that has most/all of the registers defined?

This may or may not be true, depending on which ARM chips you are using. For example on most of the Teensy boards, if you call digitalWriteFast with constants for both which pin and either HIGH or LOW, this can be reduced down to one instruction.

On T3.x which are ARM M4 boards, the code is setup to use the BIT-band operations, which is a feature of M3 and M4 boards.

A great source of information on this is in the book:
The Definitive Guide to ARM® Cortex®-M3 and Cortex®-M4 Processors: Yiu, Joseph: 9780124080829: Amazon.com: Books
In chapter 6.7.

With bitband suppose the register was at address 0x20000000 and you wanted to set bit 2 to a 1, you might simply write a 1 to the address 0x220000002

As for Teensy 4.x with is an M7 processor, M7's do not support bit-band operations.
But at least with the Teensy boards, their port registers not only have a register for the
port data, they also have a few other registers (portSet, portClear, portToggle), which only update the bits of the port data that have you passed in a corresponding high bit in the mask. So again, done with one instruction.

So keeping my fingers crossed, that there will be a decent solution.

Kurt

I did some work on this for my own library (beware not stable or documented, just using it with my own projects currently).

Get the RA4M1 hardware manual and take a look at chapter 19 on I/O ports. There are two different ways to access the pins: by port register, which lets you set the direction and read/write up to 16 pins at once, and by the pin function register which gives you more control (pull-ups, drive strength, CMOS/NMOS, analog, alternate functions) but only one pin at a time.

Beware that the pin mappings are different between the Minima and WiFi! (use #ifdef ARDUINO MINIMA and #ifdef ARDUINO_UNOWIFIR4)

Here are some examples (for Minima):

  pinMode(13, OUTPUT); // D13 -> P111
  R_PORT1->PDR |= bit(11); // same, using port register
  R_PFS->PORT[1].PIN[11].PmnPFS_BY_b.PDR = 1; // same, using pin register
  digitalWrite(13, HIGH);
  R_PORT1->PODR |= bit(11);
  R_PORT1->POSR = bit(11); // faster alternative to set without disturbing other pins
  R_PFS->PORT[1].PIN[11].PmnPFS_BY_b.PODR = 1;
  digitalWrite(13, LOW);
  R_PORT1->PODR &= ~bit(11);
  R_PORT1->PORR = bit(11); // faster alternative to reset without disturbing other pins
  R_PFS->PORT[1].PIN[11].PmnPFS_BY_b.PODR = 0;
2 Likes

No. Or rather, not exactly. Eventually, you can use bit-banding (which is only available on SOME implementations of M3/M4) or the more common set/clear GPIO registers to change a pin value with a single "store" instruction.

BUT, in isolation, you need to load both the address and the constant that you're going to store into registers, making the minimal sequence look like:

    mov ADRREG,  #portAddress
    mov TEMPREG, #bitValue
    str TEMPREG, [ADRREG, #OFFSET]

(and THEN, the GPIO registers are frequently off on a relatively slow "peripheral bus", so the actual store instruction takes several clock cycles :frowning: )

Is there a decent document or header file that has most/all of the registers defined?

I am finding the Renesas manuals (and code) to be "relatively unpleasant." (Sigh.)
I think it's about here:

The ports are 16bits wide, but defined as accessed by 32bit registers, apparently.

1 Like

I thought I would play with this some today...

Here is my current test sketch:

static inline void digitalWriteFast(uint8_t pin, uint8_t val) __attribute__((always_inline, unused));
//#include "digitalFast.h"
void setup() {
  Serial.begin(115200);
  Serial.println("\n\nTest");
  while (!Serial && millis() < 5000)
    ;

  pinMode(LED_BUILTIN, OUTPUT);
  uint32_t start_time = micros();
  for (int i = 0; i < 1000; i++) {
    digitalWrite(LED_BUILTIN, HIGH);
    digitalWrite(LED_BUILTIN, LOW);
  }
  uint32_t delta_time = micros() - start_time;
  Serial.print("digitalWrite: ");
  Serial.println(delta_time, DEC);

  start_time = micros();
  for (int i = 0; i < 1000; i++) {
    fasterDigitalWrite(LED_BUILTIN, HIGH);
    fasterDigitalWrite(LED_BUILTIN, LOW);
  }

  delta_time = micros() - start_time;
  Serial.print("Faster: ");
  Serial.println(delta_time, DEC);

  start_time = micros();
  for (int i = 0; i < 1000; i++) {
    digitalWriteFast(LED_BUILTIN, HIGH);
    digitalWriteFast(LED_BUILTIN, LOW);
  }

  delta_time = micros() - start_time;
  Serial.print("digitalWriteFast: ");
  Serial.println(delta_time, DEC);


  start_time = micros();
  for (int i = 0; i < 1000; i++) {
    R_PORT1->POSR = bit(2);
    R_PORT1->PORR = bit(2);
  }
  delta_time = micros() - start_time;
  Serial.print("POSR/PORR: ");
  Serial.println(delta_time, DEC);
}

void loop() {
}


R_PORT0_Type *port_table[] = { R_PORT0, R_PORT1, R_PORT2, R_PORT3, R_PORT4, R_PORT5, R_PORT6, R_PORT7 };
static const uint16_t mask_table[] = { 1 << 0, 1 << 1, 1 << 2, 1 << 3, 1 << 4, 1 << 5, 1 << 6, 1 << 7,
                                       1 << 8, 1 << 9, 1 << 10, 1 << 11, 1 << 12, 1 << 13, 1 << 14, 1 << 15 };
void fasterDigitalWrite(pin_size_t pin, PinStatus val) {
  uint16_t hardware_port_pin = g_pin_cfg[pin].pin;
  //uint16_t mask = 1 << (hardware_port_pin & 0xf);
  uint16_t mask = mask_table[hardware_port_pin & 0xf];
  R_PORT0_Type *port = port_table[hardware_port_pin >> 8];
  if (val) port->POSR = mask;
  else port->PORR = mask;
}

// Cause a digital pin to output either HIGH or LOW.  The pin must
// have been configured with pinMode().  This fast version of
// digitalWrite has minimal overhead when the pin number is a
// constant.  Successive digitalWriteFast without delays can be
// too quick in many applications!
static inline void digitalWriteFast(uint8_t pin, uint8_t val) {
  if (__builtin_constant_p(pin)) {
    if (val) {
      if (pin == 0) {
        port_table[g_pin_cfg[0].pin >> 8]->POSR = mask_table[g_pin_cfg[0].pin & 0xff];
      } else if (pin == 1) {
        port_table[g_pin_cfg[1].pin >> 8]->POSR = mask_table[g_pin_cfg[1].pin & 0xff];
      } else if (pin == 2) {
        port_table[g_pin_cfg[2].pin >> 8]->POSR = mask_table[g_pin_cfg[2].pin & 0xff];
      } else if (pin == 3) {
        port_table[g_pin_cfg[3].pin >> 8]->POSR = mask_table[g_pin_cfg[3].pin & 0xff];
      } else if (pin == 4) {
        port_table[g_pin_cfg[4].pin >> 8]->POSR = mask_table[g_pin_cfg[4].pin & 0xff];
      } else if (pin == 5) {
        port_table[g_pin_cfg[5].pin >> 8]->POSR = mask_table[g_pin_cfg[5].pin & 0xff];
      } else if (pin == 6) {
        port_table[g_pin_cfg[6].pin >> 8]->POSR = mask_table[g_pin_cfg[6].pin & 0xff];
      } else if (pin == 7) {
        port_table[g_pin_cfg[7].pin >> 8]->POSR = mask_table[g_pin_cfg[7].pin & 0xff];
      } else if (pin == 8) {
        port_table[g_pin_cfg[8].pin >> 8]->POSR = mask_table[g_pin_cfg[8].pin & 0xff];
      } else if (pin == 9) {
        port_table[g_pin_cfg[9].pin >> 8]->POSR = mask_table[g_pin_cfg[9].pin & 0xff];
      } else if (pin == 10) {
        port_table[g_pin_cfg[10].pin >> 8]->POSR = mask_table[g_pin_cfg[10].pin & 0xff];
      } else if (pin == 11) {
        port_table[g_pin_cfg[11].pin >> 8]->POSR = mask_table[g_pin_cfg[11].pin & 0xff];
      } else if (pin == 12) {
        port_table[g_pin_cfg[12].pin >> 8]->POSR = mask_table[g_pin_cfg[12].pin & 0xff];
      } else if (pin == 13) {
        port_table[g_pin_cfg[13].pin >> 8]->POSR = mask_table[g_pin_cfg[13].pin & 0xff];
      } else if (pin == 14) {
        port_table[g_pin_cfg[14].pin >> 8]->POSR = mask_table[g_pin_cfg[14].pin & 0xff];
      } else if (pin == 15) {
        port_table[g_pin_cfg[15].pin >> 8]->POSR = mask_table[g_pin_cfg[15].pin & 0xff];
      } else if (pin == 16) {
        port_table[g_pin_cfg[16].pin >> 8]->POSR = mask_table[g_pin_cfg[16].pin & 0xff];
      } else if (pin == 17) {
        port_table[g_pin_cfg[17].pin >> 8]->POSR = mask_table[g_pin_cfg[17].pin & 0xff];
      } else if (pin == 18) {
        port_table[g_pin_cfg[18].pin >> 8]->POSR = mask_table[g_pin_cfg[18].pin & 0xff];
      } else if (pin == 19) {
        port_table[g_pin_cfg[19].pin >> 8]->POSR = mask_table[g_pin_cfg[19].pin & 0xff];
      } else if (pin == 20) {
        port_table[g_pin_cfg[20].pin >> 8]->POSR = mask_table[g_pin_cfg[20].pin & 0xff];
      } else if (pin == 21) {
        port_table[g_pin_cfg[21].pin >> 8]->POSR = mask_table[g_pin_cfg[21].pin & 0xff];
      } else if (pin == 22) {
        port_table[g_pin_cfg[22].pin >> 8]->POSR = mask_table[g_pin_cfg[22].pin & 0xff];
      } else if (pin == 23) {
        port_table[g_pin_cfg[23].pin >> 8]->POSR = mask_table[g_pin_cfg[23].pin & 0xff];
      } else if (pin == 24) {
        port_table[g_pin_cfg[24].pin >> 8]->POSR = mask_table[g_pin_cfg[24].pin & 0xff];
      } else if (pin == 25) {
        port_table[g_pin_cfg[25].pin >> 8]->POSR = mask_table[g_pin_cfg[25].pin & 0xff];
      } else if (pin == 26) {
        port_table[g_pin_cfg[26].pin >> 8]->POSR = mask_table[g_pin_cfg[26].pin & 0xff];
      } else if (pin == 27) {
        port_table[g_pin_cfg[27].pin >> 8]->POSR = mask_table[g_pin_cfg[27].pin & 0xff];
      } else if (pin == 28) {
        port_table[g_pin_cfg[28].pin >> 8]->POSR = mask_table[g_pin_cfg[28].pin & 0xff];
      } else if (pin == 29) {
        port_table[g_pin_cfg[29].pin >> 8]->POSR = mask_table[g_pin_cfg[29].pin & 0xff];
      } else if (pin == 30) {
        port_table[g_pin_cfg[30].pin >> 8]->POSR = mask_table[g_pin_cfg[30].pin & 0xff];
      } else if (pin == 31) {
        port_table[g_pin_cfg[31].pin >> 8]->POSR = mask_table[g_pin_cfg[31].pin & 0xff];
      } else if (pin == 32) {
        port_table[g_pin_cfg[32].pin >> 8]->POSR = mask_table[g_pin_cfg[32].pin & 0xff];
      } else if (pin == 33) {
        port_table[g_pin_cfg[33].pin >> 8]->POSR = mask_table[g_pin_cfg[33].pin & 0xff];
      } else if (pin == 34) {
        port_table[g_pin_cfg[34].pin >> 8]->POSR = mask_table[g_pin_cfg[34].pin & 0xff];
      } else if (pin == 35) {
        port_table[g_pin_cfg[35].pin >> 8]->POSR = mask_table[g_pin_cfg[35].pin & 0xff];
      } else if (pin == 36) {
        port_table[g_pin_cfg[36].pin >> 8]->POSR = mask_table[g_pin_cfg[36].pin & 0xff];
      } else if (pin == 37) {
        port_table[g_pin_cfg[37].pin >> 8]->POSR = mask_table[g_pin_cfg[37].pin & 0xff];
      } else if (pin == 38) {
        port_table[g_pin_cfg[38].pin >> 8]->POSR = mask_table[g_pin_cfg[38].pin & 0xff];
      }
    } else {
      if (pin == 0) {
        port_table[g_pin_cfg[0].pin >>8]->PORR = mask_table[g_pin_cfg[0].pin & 0xff];
      } else if (pin == 1) {
        port_table[g_pin_cfg[1].pin >>8]->PORR = mask_table[g_pin_cfg[1].pin & 0xff];
      } else if (pin == 2) {
        port_table[g_pin_cfg[2].pin >>8]->PORR = mask_table[g_pin_cfg[2].pin & 0xff];
      } else if (pin == 3) {
        port_table[g_pin_cfg[3].pin >>8]->PORR = mask_table[g_pin_cfg[3].pin & 0xff];
      } else if (pin == 4) {
        port_table[g_pin_cfg[4].pin >>8]->PORR = mask_table[g_pin_cfg[4].pin & 0xff];
      } else if (pin == 5) {
        port_table[g_pin_cfg[5].pin >>8]->PORR = mask_table[g_pin_cfg[5].pin & 0xff];
      } else if (pin == 6) {
        port_table[g_pin_cfg[6].pin >>8]->PORR = mask_table[g_pin_cfg[6].pin & 0xff];
      } else if (pin == 7) {
        port_table[g_pin_cfg[7].pin >>8]->PORR = mask_table[g_pin_cfg[7].pin & 0xff];
      } else if (pin == 8) {
        port_table[g_pin_cfg[8].pin >>8]->PORR = mask_table[g_pin_cfg[8].pin & 0xff];
      } else if (pin == 9) {
        port_table[g_pin_cfg[9].pin >>8]->PORR = mask_table[g_pin_cfg[9].pin & 0xff];
      } else if (pin == 10) {
        port_table[g_pin_cfg[10].pin >>8]->PORR = mask_table[g_pin_cfg[10].pin & 0xff];
      } else if (pin == 11) {
        port_table[g_pin_cfg[11].pin >>8]->PORR = mask_table[g_pin_cfg[11].pin & 0xff];
      } else if (pin == 12) {
        port_table[g_pin_cfg[12].pin >>8]->PORR = mask_table[g_pin_cfg[12].pin & 0xff];
      } else if (pin == 13) {
        port_table[g_pin_cfg[13].pin >>8]->PORR = mask_table[g_pin_cfg[13].pin & 0xff];
      } else if (pin == 14) {
        port_table[g_pin_cfg[14].pin >>8]->PORR = mask_table[g_pin_cfg[14].pin & 0xff];
      } else if (pin == 15) {
        port_table[g_pin_cfg[15].pin >>8]->PORR = mask_table[g_pin_cfg[15].pin & 0xff];
      } else if (pin == 16) {
        port_table[g_pin_cfg[16].pin >>8]->PORR = mask_table[g_pin_cfg[16].pin & 0xff];
      } else if (pin == 17) {
        port_table[g_pin_cfg[17].pin >>8]->PORR = mask_table[g_pin_cfg[17].pin & 0xff];
      } else if (pin == 18) {
        port_table[g_pin_cfg[18].pin >>8]->PORR = mask_table[g_pin_cfg[18].pin & 0xff];
      } else if (pin == 19) {
        port_table[g_pin_cfg[19].pin >>8]->PORR = mask_table[g_pin_cfg[19].pin & 0xff];
      } else if (pin == 20) {
        port_table[g_pin_cfg[20].pin >>8]->PORR = mask_table[g_pin_cfg[20].pin & 0xff];
      } else if (pin == 21) {
        port_table[g_pin_cfg[21].pin >>8]->PORR = mask_table[g_pin_cfg[21].pin & 0xff];
      } else if (pin == 22) {
        port_table[g_pin_cfg[22].pin >>8]->PORR = mask_table[g_pin_cfg[22].pin & 0xff];
      } else if (pin == 23) {
        port_table[g_pin_cfg[23].pin >>8]->PORR = mask_table[g_pin_cfg[23].pin & 0xff];
      } else if (pin == 24) {
        port_table[g_pin_cfg[24].pin >>8]->PORR = mask_table[g_pin_cfg[24].pin & 0xff];
      } else if (pin == 25) {
        port_table[g_pin_cfg[25].pin >>8]->PORR = mask_table[g_pin_cfg[25].pin & 0xff];
      } else if (pin == 26) {
        port_table[g_pin_cfg[26].pin >>8]->PORR = mask_table[g_pin_cfg[26].pin & 0xff];
      } else if (pin == 27) {
        port_table[g_pin_cfg[27].pin >>8]->PORR = mask_table[g_pin_cfg[27].pin & 0xff];
      } else if (pin == 28) {
        port_table[g_pin_cfg[28].pin >>8]->PORR = mask_table[g_pin_cfg[28].pin & 0xff];
      } else if (pin == 29) {
        port_table[g_pin_cfg[29].pin >>8]->PORR = mask_table[g_pin_cfg[29].pin & 0xff];
      } else if (pin == 30) {
        port_table[g_pin_cfg[30].pin >>8]->PORR = mask_table[g_pin_cfg[30].pin & 0xff];
      } else if (pin == 31) {
        port_table[g_pin_cfg[31].pin >>8]->PORR = mask_table[g_pin_cfg[31].pin & 0xff];
      } else if (pin == 32) {
        port_table[g_pin_cfg[32].pin >>8]->PORR = mask_table[g_pin_cfg[32].pin & 0xff];
      } else if (pin == 33) {
        port_table[g_pin_cfg[33].pin >>8]->PORR = mask_table[g_pin_cfg[33].pin & 0xff];
      } else if (pin == 34) {
        port_table[g_pin_cfg[34].pin >>8]->PORR = mask_table[g_pin_cfg[34].pin & 0xff];
      } else if (pin == 35) {
        port_table[g_pin_cfg[35].pin >>8]->PORR = mask_table[g_pin_cfg[35].pin & 0xff];
      } else if (pin == 36) {
        port_table[g_pin_cfg[36].pin >>8]->PORR = mask_table[g_pin_cfg[36].pin & 0xff];
      } else if (pin == 37) {
        port_table[g_pin_cfg[37].pin >>8]->PORR = mask_table[g_pin_cfg[37].pin & 0xff];
      } else if (pin == 38) {
        port_table[g_pin_cfg[38].pin >>8]->PORR = mask_table[g_pin_cfg[38].pin & 0xff];
      }
    }
  } else {
    digitalWrite(pin, val);
  }
}

There could be typos... but appears to work for pin 13....

Test output:

Test
digitalWrite: 1515
Faster: 1181
digitalWriteFast: 179
POSR/PORR: 179

And confirmed on the Logic Analyzer:

1 Like

179 micros for 1000 iterations x 2 for set and reset is ~89 ns per write. That about matches what I measured when I played around with it. The fastest it can go seems to be 83 ns, or 4 cycles at 48 MHz. Interestingly, the CPU instruction that writes to the register is STRH, which should take two cycles... but the I/O port runs at 24 MHz and it takes 2 I/O port cycles to write to the register. However, you can interleave other CPU instructions between STRH like the branch and increment as in your for loop.

I read that you can set both the CPU and I/O port clocks to 36 MHz if you want to bit bang just a tad faster (56 ns per write instead of 89 ns), but without being able to interleave instructions between STRH.

As far as bit banging goes, it's pretty much on par with what the ATmega can do.

Not needed. Make your "fasterDigitalWrite()" function "static inline", and it should produce nearly identical code. The compiler will happily optimize array accesses to static arrays to simple loads without going to the ugliness you implemented.
See https://github.com/WestfW/Duino-hacks/blob/master/fastdigitalIO/fastdigitalIO.h (which utilizes this to make "clean" faster digitalWrite() functions for AVR, Mega-0 AVR, SAM, and SAMD...)

Edit: both "faster" and "fast" end up compiling the loop to code that looks like:

    41c8:	3b01      	subs	r3, #1
      } else if (pin == 13) {
        port_table[g_pin_cfg[13].pin >> 8]->POSR = mask_table[g_pin_cfg[13].pin & 0xff];
    41ca:	8151      	strh	r1, [r2, #10]
    41cc:	8111      	strh	r1, [r2, #8]
    41ce:	d1fb      	bne.n	41c8 <setup+0xc4>

ie: all the port a bit calculation (and loading up the registers with the proper constants) is moved outside of the loop, leaving a VERY tight loop with single instructions that change the pin state...

1 Like

You are right. they both worked at the same speed...

Test
digitalWrite: 1515
Faster: 178
digitalWriteFast: 178
POSR/PORR: 179

Not sure if there is anything else to experiment with? AFAIK Arduino has not added these functions to any of their cores.

Not sure if it should/would be added to:
ArminJo/digitalWriteFast: Arduino library for faster and smaller digitalWrite(), digitalRead() and pinMode() functions using direct port manipulation for constant pin numbers. (github.com)

But at least we know that it is possible to get faster IO.

Thanks again @westfw

at my signal unleash hell !
I did not plan to create such a confrontation, but I am happy I did.
I possibly understand 10% of your fantastic comments, too clever for me, so I do confirm that I will stay with the slow digitalWrite, but I hope that in a certain future something good will happen.
Thanks to all

This has the nice feature that it will do a good job of being faster, even for the cases where the arguments are NOT constants. While the ARM lacks those single-instruction pin set commands, it IS somewhat more likely to benefit from "inline" in general (thanks to more general purpose registers.)

Note that the example program, since it ends up putting a bunch of stuff outside of the actual loop, is not a particularly good benchmark for the piece of code that converts "board pin numbers" to the proper register and bit values. A more realistic benchmark might be desirable.

Bare-metal UNO R4 Minima pin set or clearing takes c. 83nS, see datasheet:
19.2.5 Port mn Pin Function Select Register (PmnPFS/PmnPFS_HA/PmnPFS_BY) (m = 0 to 9; n = 00 to 15)
This works without needing setup since R4's pin i/o is default.

#define PORTBASE 0x40040000 /* Port Base */
#define PFS_P107PFS_BY ((volatile unsigned char  *)(PORTBASE + 0x0843 + ( 7 * 4))) 

  *PFS_P107PFS_BY = 0x05;         // = digitalWrite(7, HIGH);
  *PFS_P107PFS_BY = 0x04;         // = digitalWrite(7, LOW);  

Do you expect that to be faster than POSR/PORR (other than setting direction at the same time)?
4 clock cycles. Is that measured, or calculated? Because I still expect at least three instructions, and the str takes 3 clocks. (Table 3.2) And then there's possible caching issues...

It's a direct immediate value byte write to a register location; having checked the ARM documentation, it's two instructions. MOV constant to R1, then a STRB... see

ARM - Accessing memory-mapped peripherals

Measured on a 200MHz bandwidth Tek 'scope. There is a c. 3nS asymmetry between rising and falling states.

A read takes longer...

  char_val = *PFS_P107PFS_BY;  // Port State Input read - takes about 165nS