Hi everyone,
I am trying to maximise the speed of digitalWrite by scrapping all but writing to registers.
I have been following this tutorial, where it writes to the arduino uno like this:
PORTB = B00000001;
PORTB = B00000000;
where digital pin 8 (portB0) is changed from high to low.
I am trying to do this same on the nano 33 iot on digital pin 2 - (portB10)
but it's not as simple as PORTB = B00000000000000000000001000000000 unfortunately.
I have looked at the source code for digitalWrite but is lost on me.
What timing issue do you have ? Is it something specific with interrupts and microseconds ?
On this forum, when someone want direct register control to make it faster, there is often a lot of delay() in the sketch.
I am using a library for interrupts, that's all good. Just want to make a faster digitalWrite function to maximise frequency turning pins on or off. Will check out OneWire
Of course it's up to you, but it would be a good idea to answer the question, "What timing issue do you have ? Is it something specific with interrupts and microseconds ?
On this forum, when someone want direct register control to make it faster, there is often a lot of delay() in the sketch."
Because in about 50% of the similar problems presented in this forum, the digital I/O is not the bottleneck.
The 16 MHz ATmega328P of a Arduino Uno has a instruction set dedicated for fast turning pins on and off. They can output a signal of 4 MHz with software instructions.
(Edit: Change PORT to PORT_IOBASE for lower cycle count access to pins.)
PORT_IOBASE->Group[1].OUTSET.reg = 1 << 10; // set bit 10
PORT_IOBASE->Group[1].OUTCLR.reg = 1 << 10; // clear bit 10
There isn't really any clever way to set the bit to a variable value, other than wrapping the above in if/else statements.
(that's not a horrible it looks. All the structure and union calculations are on constants and happen at compile time.)
Note that the AVR examples you provided set most of the bits in the port to zero, as well setting B0 to 1... You could do that as well.
As a more general-purpose RISC CPU, the ARM does not have special instructions to set and clear pins (the AVR does), so the basic "clear a pin" sequence takes four instructions and 2 registers That means that the "max speed" of the 48MHz SAMD is not as much faster than the max speed of a 16MHz AVR. (it is subject to some optimization if you manipulate multiple bits "nearby" in the code. However, it's quite difficult to figure out exactly how the optimization will go, making the creation of fully deterministic code quite ... annoying. )
PORT->Group[1].OUTSET.reg = 1<<10;
1bc: 2280 movs r2, #128 ; bit value.
1be: 4b02 ldr r3, [pc, #8] ; load address of IO register.
1c0: 00d2 lsls r2, r2, #3 ; finish adjusting bit value, cause it doesn't fit in MOVS
1c2: 601a str r2, [r3, #0] ; store bit value to IO register.
Toss in the need for atomicity and things can start to look different.
i.e. having bit set / bit clear registers can provide atomic update capability that cannot be provided by the AVR instructions when using variables vs constants for the bits so for this scenario the AVR has to add many more instructions to ensure atomicity.
Yes. I'm not sure about "many more", but it's at about double (six instructions vs three. And an extra register. (OTOH, the AVR has 32 registers, while the ARM has 8 (-ish))
In general, the AVR goes "fast" because it has special instructions, and things fall apart when those instructions aren't applicable by themselves. ARM has no special instructions, so what would be a special case on the AVR doesn't make much difference on the ARM, but most ARM microcontrollers add some smarts to the peripherals (the set/clear registers and similar) to make the bit-twiddlers happier. (new AVRs, like the ATmega4809 in the Nano Every and Uno Wifi 2, have smarter peripherals AND special instructions, at least for some of the peripherals.)
I tend to look at how things work when using C.
So looking at something like setting or clearing a bit in a output port register.
On the AVR, if you are wanting to do that from C and use something like
*portreg |= bitvalue;
You get a single atomic instruction when the pointer and the bit value are both constants and the portregister address is within the address range that supports the special instructions. (not all ports on the AVR support bit set/clear instructions)
If not, you have to mask interrupts during that operation to ensure atomicity.
Which means you have to save the status register, mask interrupts, do the operation which is now a read of the port to a register, update the register, write the register to the port, then unmask interrupts.
In many cases the extra processor registers are not necessarily a help/win since when using C functions, the function will have to save and restore the registers.
I think in many cases the 32 bit registers and the better instructions for handling pointers and data structures can more than make up for the AVR having more registers.
Th first function is three instructions (not counting the "return") (Read, Or, Write.) The second is six instructions (Read status, CLI, read, or, Write, Write Status.)
In many cases the extra processor registers are not necessarily a help/win since when using C functions, the function will have to save and restore the registers.
I dunno. It gets complicated. An AVR-GCC function has 12 registers that it is free to muck about with without needing to save/restore them (though if the caller was using them, IT needs to save them. One of the reasons that optimized code (and especially with -LTO) is so hard to read is that the compiler will stick values into seldom used registers really early in the code, and then if you're looking at the code segment that actually uses those values/registers, it's like "R3? What's in R3? How did it get there?"
the 32 bit registers and the better instructions for handling pointers and data structures can more than make up for the AVR having more registers.
Oh, definitely. Also the ARM registers are more truly "general purpose" than the AVR registers (AVR only has two "actual" index registers, one of which is frequently used as a "frame pointer"; it's a bottleneck. Vs, um, 9 on a SAMD21.)