digitalWriteFast for GIGA?

As I wondered in the thread:
Adafruit_ILI9341 and SPI library issues on GIGA board - Hardware / GIGA R1 - Arduino Forum

Is there a faster version of digitalWrite for these boards.

Does anyone have a version that works well?

So I thought I would try a quick and dirty experiment, along the line I did for the UNO R4.

First attempt: I put the function directly into the sketch for ease of trying.

#include "pinDefinitions.h"

#define PIN 2

static inline void digitalWriteFast(uint8_t pin, PinStatus val) __attribute__((always_inline, unused));
//#include "digitalFast.h"


void setup() {
  Serial.begin(115200);
  Serial.println("\n\nTest");
  while (!Serial && millis() < 5000)
    ;

  pinMode(PIN, OUTPUT);
}


void do_digitalWrite() {

  uint32_t start_time = micros();
  for (int i = 0; i < 1000; i++) {
    digitalWrite(PIN, HIGH);
    digitalWrite(PIN, LOW);
  }
  uint32_t delta_time = micros() - start_time;
  Serial.print("digitalWrite: ");
  Serial.println(delta_time, DEC);
}

void do_digitalWriteFast() {
  uint32_t start_time = micros();
  for (int i = 0; i < 1000; i++) {
    digitalWriteFast(PIN, HIGH);
    digitalWriteFast(PIN, LOW);
  }
  uint32_t delta_time = micros() - start_time;
  Serial.print("digitalWriteFast: ");
  Serial.println(delta_time, DEC);
}

void loop() {
  do_digitalWrite();
  do_digitalWriteFast();
  delay(1000);
}


static  GPIO_TypeDef * const port_table[] = { GPIOA, GPIOB, GPIOC, GPIOD, GPIOE, GPIOE, GPIOF, GPIOG, GPIOH, GPIOI, GPIOJ, GPIOK };
static const uint16_t mask_table[] = { 1 << 0, 1 << 1, 1 << 2, 1 << 3, 1 << 4, 1 << 5, 1 << 6, 1 << 7,
                                       1 << 8, 1 << 9, 1 << 10, 1 << 11, 1 << 12, 1 << 13, 1 << 14, 1 << 15 };
static inline void digitalWriteFast(pin_size_t pin, PinStatus val) {
  PinName hardware_port_pin = g_APinDescription[pin].name;
  //uint16_t mask = 1 << (hardware_port_pin & 0xf);
  uint16_t mask = mask_table[hardware_port_pin & 0xf];
  GPIO_TypeDef  * const port = port_table[hardware_port_pin >> 8];
  if (val) port->BSRR = mask;
  else port->BSRR = (uint32_t)(mask << 16);
}

It appears to be working and reasonably faster:

digitalWrite: 253
digitalWriteFast: 54
digitalWrite: 250
digitalWriteFast: 51
digitalWrite: 252
digitalWriteFast: 52

image

Or more zoomed in:
image
vs
image

I am not great on assembly language on these boards, but:

08040390 <_Z19do_digitalWriteFastv>:
 8040390:	b510      	push	{r4, lr}
 8040392:	f001 fe61 	bl	8042058 <micros>
 8040396:	4b11      	ldr	r3, [pc, #68]	; (80403dc <_Z19do_digitalWriteFastv+0x4c>)
 8040398:	4a11      	ldr	r2, [pc, #68]	; (80403e0 <_Z19do_digitalWriteFastv+0x50>)
 804039a:	4604      	mov	r4, r0
 804039c:	f9b3 3020 	ldrsh.w	r3, [r3, #32]
 80403a0:	f003 010f 	and.w	r1, r3, #15
 80403a4:	121b      	asrs	r3, r3, #8
 80403a6:	f832 1011 	ldrh.w	r1, [r2, r1, lsl #1]
 80403aa:	4a0e      	ldr	r2, [pc, #56]	; (80403e4 <_Z19do_digitalWriteFastv+0x54>)
 80403ac:	0408      	lsls	r0, r1, #16
 80403ae:	f852 2023 	ldr.w	r2, [r2, r3, lsl #2]
 80403b2:	f44f 737a 	mov.w	r3, #1000	; 0x3e8
 80403b6:	3b01      	subs	r3, #1
 80403b8:	6191      	str	r1, [r2, #24]
 80403ba:	6190      	str	r0, [r2, #24]
 80403bc:	d1fb      	bne.n	80403b6 <_Z19do_digitalWriteFastv+0x26>
 80403be:	f001 fe4b 	bl	8042058 <micros>
 80403c2:	1b04      	subs	r4, r0, r4
 80403c4:	4908      	ldr	r1, [pc, #32]	; (80403e8 <_Z19do_digitalWriteFastv+0x58>)
 80403c6:	4809      	ldr	r0, [pc, #36]	; (80403ec <_Z19do_digitalWriteFastv+0x5c>)
 80403c8:	f001 fdb1 	bl	8041f2e <_ZN7arduino5Print5printEPKc>
 80403cc:	4621      	mov	r1, r4
 80403ce:	220a      	movs	r2, #10
 80403d0:	4806      	ldr	r0, [pc, #24]	; (80403ec <_Z19do_digitalWriteFastv+0x5c>)
 80403d2:	e8bd 4010 	ldmia.w	sp!, {r4, lr}
 80403d6:	f001 bded 	b.w	8041fb4 <_ZN7arduino5Print7printlnEmi>
 80403da:	bf00      	nop
 80403dc:	24000004 	strcs	r0, [r0], #-4
 80403e0:	080576c6 	stmdaeq	r5, {r1, r2, r6, r7, r9, sl, ip, sp, lr}
 80403e4:	080576e8 	stmdaeq	r5, {r3, r5, r6, r7, r9, sl, ip, sp, lr}
 80403e8:	080576b3 	stmdaeq	r5, {r0, r1, r4, r5, r7, r9, sl, ip, sp, lr}
 80403ec:	24001780 	strcs	r1, [r0], #-1920	; 0xfffff880

Not sure if this will go anywhere but does show some promise.

2 Likes

I pushed this into a quick and dirty library part of my UNOR4 stuff project
UNOR4-stuff/libraries/GIGA_digitalWriteFast/GIGA_digitalWriteFast.h at main · KurtE/UNOR4-stuff (github.com)

Nothing special, but setup to use for my own tests;

Also in here is digitalToggleFast and digitalReadFast... Have not tried the read yet.

static inline void digitalToggleFast(uint8_t pin) __attribute__((always_inline, unused));
static inline void digitalToggleFast(pin_size_t pin) {
  PinName hardware_port_pin = g_APinDescription[pin].name;
  uint16_t pin_mask = mask_table[hardware_port_pin & 0xf];
  GPIO_TypeDef  * const portX = port_table[hardware_port_pin >> 8];

  if (portX->ODR & pin_mask) portX->BSRR = (uint32_t)(pin_mask << 16);
  else portX->BSRR = pin_mask;
}

So far, the toggle is reasonably slower than the write is but comes in convenient.

digitalWrite: 253
digitalWriteFast: 54
digitalToggleFast: 131
digitalWrite: 252
digitalWriteFast: 51
digitalToggleFast: 137

image

But still about twice as fast as digitalWrite

2 Likes

Nice one thanks, I will grab this code.

Some of the Giga Arduino code is not that quick I have found.

With the Giga Display Shield using DMA2D I managed to get a speedup of nearly 60 times for rectangle fills compared to Arduino_H7_Video/ArduinoGraphics and 27 times compared to Arduino_GigaDisplay_GFX!

Warning, I found an issue, where it was not working on the LED pin. Looks like a problem
with the table. A port was replicated twice... Testing it soon with that duplicate removed.

EDIT: Looks like I need to do some more debugging for different pins.
I already knew that maybe I needed to call digitalWrite at least once on the pin to initialize some other things...

There is a crap load of abstraction based around mbed::DigitalInOut

Feast your eyes on:

void digitalWrite(PinName pin, PinStatus val)
{
  pin_size_t idx = PinNameToIndex(pin);
  if (idx != NOT_A_PIN) {
    digitalWrite(idx, val);
  } else {
    mbed::DigitalOut(pin).write((int)val);
  }
}

void digitalWrite(pin_size_t pin, PinStatus val)
{
  if (pin >= PINS_COUNT) {
    return;
  }
  mbed::DigitalInOut* gpio = digitalPinToGpio(pin);
  if (gpio == NULL) {
    gpio = new mbed::DigitalInOut(digitalPinToPinName(pin), PIN_OUTPUT, PullNone, val);
    digitalPinToGpio(pin) = gpio;
  }
  gpio->write(val);
}

So if you are going in with a PinName (for example, LED_BLUE (PE_3)) it first converts (slowly) that to a pin_size_t (88) then calls the other digitalWrite, if the GPIO has not been used before, it converts the pin_size_t back to a PinName!! and creates a DigitalInOut object and caches it.

The creation of that DigitalInOut object does the setting up of all the STM GPIO crap needed.

1 Like

Thanks,

I updated the quick and dirty digitalWriteFast stuff. I had another error in the mappings, where I started off from the UNOR4 stuff, where the Port number was shifted 8 bits up, and on these they are only shifted 4 bits...

In my test case I do call digitalWrite at least once first, but not sure yet if needed as not sure if the call to pinMode before it would do all of that setup junk...
I think it does, but...

I just pushed up an updated version where I added in duplicates of the digitalWriteFast, digitalToggleFast and digitalReadFast functions, where I added in overloaded functions that either handle by the normal arduino pin number, or by the PinName

I also added a test sketch that touches the 3 LED colors to see if they work. It is setup, that it can be configured to run in the M7 or M4 cores. And tried accessing them by several different name/numbers, to see what maps to what:

#include <RPC.h>
#include <GIGA_digitalWriteFast.h>

Stream *USERIAL = nullptr;

void setup() {
  // put your setup code here, to run once:
  if (HAL_GetCurrentCPUID() == CM7_CPUID) {

    while (!Serial && millis() < 5000) {}
    Serial.begin(115200);
    Serial.println("\n*** Test Led Pins M7 version ***");
    USERIAL = &Serial;
  } else {
    RPC.begin();
    USERIAL = &RPC;
    USERIAL->println("\n*** Test Led Pins M4 version ***");
  }

  pinMode(LED_BUILTIN, OUTPUT);
  pinMode(LED_BLUE, OUTPUT);
  pinMode(LED_GREEN, OUTPUT);
  pinMode(LED_RED, OUTPUT);
  pinMode(86, OUTPUT);
  pinMode(87, OUTPUT);
  pinMode(88, OUTPUT);
}

void test_pin(const char *name, pin_size_t pin) {
  USERIAL->print("Test Pin by number: ");
  USERIAL->print(name);
  USERIAL->print("(");
  USERIAL->print(pin, DEC);
  USERIAL->println(")");
  for (uint8_t i = 0; i < 2; i++) {
    digitalWriteFast(pin, HIGH);
    delay(250);
    digitalWriteFast(pin, LOW);
    delay(250);
  }
  for (uint8_t i = 0; i < 4; i++) {
    digitalToggleFast(pin);
    delay(500);
  }
  digitalWriteFast(pin, HIGH);
}

void test_pin(const char *name, PinName pin) {
  USERIAL->print("Test Pin by name: ");
  USERIAL->print(name);
  USERIAL->print("(");
  USERIAL->print(pin, DEC);
  USERIAL->println(")");

  for (uint8_t i = 0; i < 2; i++) {
    digitalWriteFast(pin, HIGH);
    delay(250);
    digitalWriteFast(pin, LOW);
    delay(250);
  }
  for (uint8_t i = 0; i < 4; i++) {
    digitalToggleFast(pin);
    delay(500);
  }
  digitalWriteFast(pin, HIGH);
}


void loop() {
  // put your main code here, to run repeatedly:
  test_pin("LED_BUILTIN", LED_BUILTIN);
  test_pin("LED_RED", LED_RED);
  test_pin("LED_GREEN", LED_GREEN);
  test_pin("LED_BLUE", LED_BLUE);
  test_pin("86", 86);
  test_pin("87", 87);
  test_pin("88", 88);
  test_pin("D86", D86);
  test_pin("D87", D87);
  test_pin("D88", D88);

}

All of the name combinations appears to work fine on the M7 core, but the named
ones like LED_RED do not work on the M4 core, as the variant defined for this is using the mappings for a different board which does not match.

That is the pins for M7 are defined correctly:
PinName LED_RED: 140 Green: 157 Blue: 67

But on the M4 they are currently defined:
PinName LED_RED: 165 Green: 166 Blue: 167

As you can see by debug output:

*** Test Led Pins M4 version ***
Test Pin by number: LED_BUILTIN(87)
Test Pin by name: LED_RED(165)
Test Pin by name: LED_GREEN(166)
Test Pin by name: LED_BLUE(167)
Test Pin by number: 86(86)
Test Pin by number: 87(87)
Test Pin by number: 88(88)
Test Pin by number: D86(86)
Test Pin by number: D87(87)
Test Pin by number: D88(88)
2 Likes

Looks like the suggested change by @facchinm ( Add GIGA_M4 variant by AndrewCapon · Pull Request #740 · arduino/ArduinoCore-mbed (github.com)) seems to fixed this. Here are the results I got:

M7 Pin Test
------------------------------------
Test Pin by number: LED_BUILTIN(87)
Test Pin by name: LED_RED(140)
Test Pin by name: LED_GREEN(157)
Test Pin by name: LED_BLUE(67)
Test Pin by number: 86(86)
Test Pin by number: 87(87)
Test Pin by number: 88(88)
Test Pin by number: D86(86)
Test Pin by number: D87(87)
Test Pin by number: D88(88)

M4 Pin Test
------------------------------------
Test Pin by number: LED_BUILTIN(87)
Test Pin by name: LED_RED(140)
Test Pin by name: LED_GREEN(157)
Test Pin by name: LED_BLUE(67)
Test Pin by number: 86(86)
Test Pin by number: 87(87)
Test Pin by number: 88(88)
Test Pin by number: D86(86)
Test Pin by number: D87(87)
Test Pin by number: D88(88)

Guess more playing.

1 Like

Yes it appears to work. Note: I updated that test slightly to give a little more information, when you pass in a pin name. I also tried a couple more pin names that map to the same ...

*** Test Led Pins M4 version ***
Test Pin by number: LED_BUILTIN(87)
Test Pin by name: LED_RED(140 PI_12)
Test Pin by name: LED_GREEN(157 PJ_13)
Test Pin by name: LED_BLUE(67 PE_3)
Test Pin by number: 86(86)
Test Pin by number: 87(87)
Test Pin by number: 88(88)
Test Pin by number: D86(86)
Test Pin by number: D87(87)
Test Pin by number: D88(88)
Test Pin by name: PI_12(140 PI_12)
Test Pin by name: PJ_13(157 PJ_13)
Test Pin by name: PE_3(67 PE_3)