Pin Mapping on NANO 33 iot for fast digitalWrite

Hi everyone,
I am trying to maximise the speed of digitalWrite by scrapping all but writing to registers.
I have been following this tutorial, where it writes to the arduino uno like this:

    PORTB = B00000001;
    PORTB = B00000000;

where digital pin 8 (portB0) is changed from high to low.

I am trying to do this same on the nano 33 iot on digital pin 2 - (portB10)
but it's not as simple as PORTB = B00000000000000000000001000000000 unfortunately.

I have looked at the source code for digitalWrite but is lost on me.

Any ideas?
Thanks, Will

Hi, could you change the category to "Nano family", "Nano 33 IoT".

The Nano 33 IOT has a SAMD21 processor. That is not like the Arduino Uno.

I start at the OneWire library, to see how they use other processors: https://github.com/PaulStoffregen/OneWire/blob/master/util/OneWire_direct_gpio.h

What timing issue do you have ? Is it something specific with interrupts and microseconds ?
On this forum, when someone want direct register control to make it faster, there is often a lot of delay() in the sketch.

A few months ago, there was this topic : https://forum.arduino.cc/t/what-is-the-fastest-way-to-read-write-gpios-on-samd21-boards/907133

I am using a library for interrupts, that's all good. Just want to make a faster digitalWrite function to maximise frequency turning pins on or off. Will check out OneWire

Of course it's up to you, but it would be a good idea to answer the question, "What timing issue do you have ? Is it something specific with interrupts and microseconds ?
On this forum, when someone want direct register control to make it faster, there is often a lot of delay() in the sketch."

Because in about 50% of the similar problems presented in this forum, the digital I/O is not the bottleneck.

The 16 MHz ATmega328P of a Arduino Uno has a instruction set dedicated for fast turning pins on and off. They can output a signal of 4 MHz with software instructions.

Set/clear Port B10 on a SAMD:

(Edit: Change PORT to PORT_IOBASE for lower cycle count access to pins.)

  PORT_IOBASE->Group[1].OUTSET.reg = 1 << 10;    // set bit 10
  PORT_IOBASE->Group[1].OUTCLR.reg = 1 << 10;    // clear bit 10

There isn't really any clever way to set the bit to a variable value, other than wrapping the above in if/else statements.
(that's not a horrible it looks. All the structure and union calculations are on constants and happen at compile time.)

Note that the AVR examples you provided set most of the bits in the port to zero, as well setting B0 to 1... You could do that as well.

See also Adafruit customer service forums • View topic - Increasing the speed of execution of the adafruit feather
and Duino-hacks/fastdigitalIO_samd.h at master · WestfW/Duino-hacks · GitHub (which is significantly faster than digitalWrite() while maintaining the pin-mapping features, but not so fast as it might be if link-time-optimization were turned on (and if it worked.))

As a more general-purpose RISC CPU, the ARM does not have special instructions to set and clear pins (the AVR does), so the basic "clear a pin" sequence takes four instructions and 2 registers :frowning: That means that the "max speed" of the 48MHz SAMD is not as much faster than the max speed of a 16MHz AVR. (it is subject to some optimization if you manipulate multiple bits "nearby" in the code. However, it's quite difficult to figure out exactly how the optimization will go, making the creation of fully deterministic code quite ... annoying. )

		PORT->Group[1].OUTSET.reg = 1<<10;
 1bc:	2280      	movs	r2, #128	; bit value.
 1be:	4b02      	ldr	r3, [pc, #8]	; load address of IO register.
 1c0:	00d2      	lsls	r2, r2, #3  ; finish adjusting bit value, cause it doesn't fit in MOVS
 1c2:	601a      	str	r2, [r3, #0]    ; store bit value to IO register.
1 Like

Toss in the need for atomicity and things can start to look different.
i.e. having bit set / bit clear registers can provide atomic update capability that cannot be provided by the AVR instructions when using variables vs constants for the bits so for this scenario the AVR has to add many more instructions to ensure atomicity.

--- bill

Yes. I'm not sure about "many more", but it's at about double (six instructions vs three. And an extra register. (OTOH, the AVR has 32 registers, while the ARM has 8 (-ish))

In general, the AVR goes "fast" because it has special instructions, and things fall apart when those instructions aren't applicable by themselves. ARM has no special instructions, so what would be a special case on the AVR doesn't make much difference on the ARM, but most ARM microcontrollers add some smarts to the peripherals (the set/clear registers and similar) to make the bit-twiddlers happier. (new AVRs, like the ATmega4809 in the Nano Every and Uno Wifi 2, have smarter peripherals AND special instructions, at least for some of the peripherals.)

I tend to look at how things work when using C.
So looking at something like setting or clearing a bit in a output port register.
On the AVR, if you are wanting to do that from C and use something like

*portreg |= bitvalue;

You get a single atomic instruction when the pointer and the bit value are both constants and the portregister address is within the address range that supports the special instructions. (not all ports on the AVR support bit set/clear instructions)
If not, you have to mask interrupts during that operation to ensure atomicity.
Which means you have to save the status register, mask interrupts, do the operation which is now a read of the port to a register, update the register, write the register to the port, then unmask interrupts.

In many cases the extra processor registers are not necessarily a help/win since when using C functions, the function will have to save and restore the registers.
I think in many cases the 32 bit registers and the better instructions for handling pointers and data structures can more than make up for the AVR having more registers.

--- bill

void setbit(uint8_t mask) {
  PORTB |= mask;
}

void setbit_r(uint8_t mask) {
  ATOMIC_BLOCK(ATOMIC_RESTORESTATE) {
    PORTB |= mask;
  }
}

Th first function is three instructions (not counting the "return") (Read, Or, Write.) The second is six instructions (Read status, CLI, read, or, Write, Write Status.)

In many cases the extra processor registers are not necessarily a help/win since when using C functions, the function will have to save and restore the registers.
I dunno. It gets complicated. An AVR-GCC function has 12 registers that it is free to muck about with without needing to save/restore them (though if the caller was using them, IT needs to save them. One of the reasons that optimized code (and especially with -LTO) is so hard to read is that the compiler will stick values into seldom used registers really early in the code, and then if you're looking at the code segment that actually uses those values/registers, it's like "R3? What's in R3? How did it get there?"

the 32 bit registers and the better instructions for handling pointers and data structures can more than make up for the AVR having more registers.

Oh, definitely. Also the ARM registers are more truly "general purpose" than the AVR registers (AVR only has two "actual" index registers, one of which is frequently used as a "frame pointer"; it's a bottleneck. Vs, um, 9 on a SAMD21.)