Hi, I just got this new board and I wish to run some speed test.
In details, I want to try the fast bit/port manipulation but i'm confused by a totally different port names on this module.
I was used to see PORTA..PORTD names, but here I have something like P0xx..P3xx
P0 or P1 are not recognized as ports, nor P105 (D2) in example
How can I refer to those pins ?
I'm going to try a digitalWriteFast but is needs the port name constant.
Thanks for your answer Koepel.
My board is a WiFi model; I though it was not important to specify it; I assume that the pin definitions are the same fo both models.
I gave a look to the linked documente, but sadly I believe it is something above my knowledge
R_IOPORT_PinWrite(NULL, g_pin_cfg[pin].pin,...
I do not understand what the library expects from me; I mean:
What is g_pin_cfg[pin].pin
If it is a pin number, how is it defined (i.e. P105 or D2) ?
Understand how the ARM architecture in general implements GPIO.
Study the chip reference manual, to figure out the specifics of this particular Renesas chip, including the "register names" associated with GPIO.
Study the existing core code, to figure out how it does things, and how it could be sped up.
Study the Renesas "fsp" library, to see how IT works, since the Arduino core mostly calls it to do the work.
Study the variants/pins_arduino.h to see how pin mapping works, and come up with an alternative for the "fast" version.
(There's a reason that digitalWrite() is such a brilliant abstraction/function, in spite of being generally slow!)
In general, an ARM chip doesn't allow for as much optimization as an AVR - there are no special instructions that can change a GPIO bit in one instruction (it usually takes 4.) And the penalties for the extra steps taken by Arduino (eg allowing a variable pin and value) are lower. This means that you can't expect as much of a speedup as the AVR 'fast" code provides. You could probably get 3-4x faster, but not 10-20x...
I was wondering about some of this stuff and was wondering if some of this was yet available.
On two different fronts. How to speed up the IO, and how compatible is the R4 with previous UNO's?
Note: I currently have the WIFI version have a minima on order...
My first experiment with this, was to see if you could do something like:
In file included from C:\Users\kurte\AppData\Local\Temp\arduino\sketches\DECE862A8C38F65BC2E4A75893A1637E\sketch\zzz.ino.cpp:1:0:
C:\Users\kurte\Documents\Arduino\zzz\zzz.ino: In function 'void setup()':
C:\Users\kurte\AppData\Local\Arduino15\packages\arduino\hardware\renesas_uno\1.0.1\cores\arduino/Arduino.h:74:59: error: invalid conversion from 'int' to 'uint32_t* {aka long unsigned int*}' [-fpermissive]
#define digitalPinToPort(P) (digitalPinToBspPin(P) >> 8)
~~~~~~~~~~~~~~~~~~~~~~~^~~~~
C:\Users\kurte\Documents\Arduino\zzz\zzz.ino:16:20: note: in expansion of macro 'digitalPinToPort'
uint32_t *port = digitalPinToPort(LED_BUILTIN);
^~~~~~~~~~~~~~~~
Which was not overly unexpected. As on some boards port sizes are 32 bits, others 16 or 8...
but not sure what int. means here..
So assumption this won't work...
I am also trying to understand their documentation as well. I am used to several other ARM based boards, but it has been a long time since I have done anything with a Renesas board.
Is there a decent document or header file that has most/all of the registers defined?
This may or may not be true, depending on which ARM chips you are using. For example on most of the Teensy boards, if you call digitalWriteFast with constants for both which pin and either HIGH or LOW, this can be reduced down to one instruction.
On T3.x which are ARM M4 boards, the code is setup to use the BIT-band operations, which is a feature of M3 and M4 boards.
With bitband suppose the register was at address 0x20000000 and you wanted to set bit 2 to a 1, you might simply write a 1 to the address 0x220000002
As for Teensy 4.x with is an M7 processor, M7's do not support bit-band operations.
But at least with the Teensy boards, their port registers not only have a register for the
port data, they also have a few other registers (portSet, portClear, portToggle), which only update the bits of the port data that have you passed in a corresponding high bit in the mask. So again, done with one instruction.
So keeping my fingers crossed, that there will be a decent solution.
I did some work on this for my own library (beware not stable or documented, just using it with my own projects currently).
Get the RA4M1 hardware manual and take a look at chapter 19 on I/O ports. There are two different ways to access the pins: by port register, which lets you set the direction and read/write up to 16 pins at once, and by the pin function register which gives you more control (pull-ups, drive strength, CMOS/NMOS, analog, alternate functions) but only one pin at a time.
Beware that the pin mappings are different between the Minima and WiFi! (use #ifdef ARDUINO MINIMA and #ifdef ARDUINO_UNOWIFIR4)
Here are some examples (for Minima):
pinMode(13, OUTPUT); // D13 -> P111
R_PORT1->PDR |= bit(11); // same, using port register
R_PFS->PORT[1].PIN[11].PmnPFS_BY_b.PDR = 1; // same, using pin register
digitalWrite(13, HIGH);
R_PORT1->PODR |= bit(11);
R_PORT1->POSR = bit(11); // faster alternative to set without disturbing other pins
R_PFS->PORT[1].PIN[11].PmnPFS_BY_b.PODR = 1;
digitalWrite(13, LOW);
R_PORT1->PODR &= ~bit(11);
R_PORT1->PORR = bit(11); // faster alternative to reset without disturbing other pins
R_PFS->PORT[1].PIN[11].PmnPFS_BY_b.PODR = 0;
No. Or rather, not exactly. Eventually, you can use bit-banding (which is only available on SOME implementations of M3/M4) or the more common set/clear GPIO registers to change a pin value with a single "store" instruction.
BUT, in isolation, you need to load both the address and the constant that you're going to store into registers, making the minimal sequence look like:
179 micros for 1000 iterations x 2 for set and reset is ~89 ns per write. That about matches what I measured when I played around with it. The fastest it can go seems to be 83 ns, or 4 cycles at 48 MHz. Interestingly, the CPU instruction that writes to the register is STRH, which should take two cycles... but the I/O port runs at 24 MHz and it takes 2 I/O port cycles to write to the register. However, you can interleave other CPU instructions between STRH like the branch and increment as in your for loop.
I read that you can set both the CPU and I/O port clocks to 36 MHz if you want to bit bang just a tad faster (56 ns per write instead of 89 ns), but without being able to interleave instructions between STRH.
As far as bit banging goes, it's pretty much on par with what the ATmega can do.
Not needed. Make your "fasterDigitalWrite()" function "static inline", and it should produce nearly identical code. The compiler will happily optimize array accesses to static arrays to simple loads without going to the ugliness you implemented.
See https://github.com/WestfW/Duino-hacks/blob/master/fastdigitalIO/fastdigitalIO.h (which utilizes this to make "clean" faster digitalWrite() functions for AVR, Mega-0 AVR, SAM, and SAMD...)
Edit: both "faster" and "fast" end up compiling the loop to code that looks like:
ie: all the port a bit calculation (and loading up the registers with the proper constants) is moved outside of the loop, leaving a VERY tight loop with single instructions that change the pin state...
at my signal unleash hell !
I did not plan to create such a confrontation, but I am happy I did.
I possibly understand 10% of your fantastic comments, too clever for me, so I do confirm that I will stay with the slow digitalWrite, but I hope that in a certain future something good will happen.
Thanks to all
This has the nice feature that it will do a good job of being faster, even for the cases where the arguments are NOT constants. While the ARM lacks those single-instruction pin set commands, it IS somewhat more likely to benefit from "inline" in general (thanks to more general purpose registers.)
Note that the example program, since it ends up putting a bunch of stuff outside of the actual loop, is not a particularly good benchmark for the piece of code that converts "board pin numbers" to the proper register and bit values. A more realistic benchmark might be desirable.
Bare-metal UNO R4 Minima pin set or clearing takes c. 83nS, see datasheet: 19.2.5 Port mn Pin Function Select Register (PmnPFS/PmnPFS_HA/PmnPFS_BY) (m = 0 to 9; n = 00 to 15)
This works without needing setup since R4's pin i/o is default.
Do you expect that to be faster than POSR/PORR (other than setting direction at the same time)?
4 clock cycles. Is that measured, or calculated? Because I still expect at least three instructions, and the str takes 3 clocks. (Table 3.2) And then there's possible caching issues...
It's a direct immediate value byte write to a register location; having checked the ARM documentation, it's two instructions. MOV constant to R1, then a STRB... see