In this posting: Faster Pin I/O on Zero? - #39 by OMac@OMac wrote two useful functions digitalWrite_fast and digitalRead_fast, in a response to a question from @wholder . I've measured it, and it improves the speed from 350 kHz for toggling a pin to 1.4 MHz.
The thread is closed, so I can't reply there, but my solution might be interesting for others as well. So here is my code with C++, which makes the code much more readable, and the toggle speed is now 3.4 MHz (see last code at the bottom) :
Left as an exercise to the reader: Integrate the pinMode call
It is an uint32_t pointer, because the reg element of the PORT_OUTSET_Type union type is an uint32_t, see here. This union is later used as elements of other structs, and then finally a pointer to such a struct (with the value of the hardware address of the microcontroller peripherals) is stored in the Arduino PORT array.
This leaves out some of the things that digitalWrite() would normally do (like "turn on pullups if the pin is in input mode")
It's a bit RAM-heavy - 16bytes of RAM per pin you define.
Some of the speedup is just from being implemented "inline", which will increase flash usage "some", and also makes the toggle loop able to "cheat" a bit.
The original Arduino code is, in fact, a bit of a "meh"... (the SAMD code a bit less "meh" than many, but still...)
I don't particularly like overloading used to make pins look like normal variables (cause part of "embedded" is about noticing how different they are), but Frankbuss's code should work equally well with a "write" method instead of an operator overload. And it looks like a good example of how to trade off the RAM/Flash consumption for performance.
(I think it will also work fine and pretty optimally for dynamic usage without changing APIs. That's pretty neat,)
void softSpiSend(int clockpin, int datapin, int output) {
FastPin clock(clockpin);
FastPin data(datapin);
for (int i=0; i<8 i++) {
clock = 0;
data = output & 1;
output >>= 1;
clock = 1;
}
}
@frankbuss
Thank You for your contribution.
Did You ever check the pointers?
I use direct port access with the Arduino DUE since years. Not only for speeding up but for synchronizing the bit output. So I do not only address a single bit but a group of bits with an accordingly mask in one write access.
When I developed those routines for the SAM3X, I simply studied the data sheet of the processor and made a simple address list with the values from the data sheet. That was easy.
When I did the same with the SAMD21 my Arduino Zero crashed. I did not understand it.
Then I extended the routine pinMode in wiring_digital.c with writing the content of the register pointer into memory and after calling the routine I fetched the memory content and showed it via serial output.
And was surprised, that it was not the register address of 0x41004400 as defined in the data sheet, but the address 0x08000000.
Do you (or any other expert) know, why it is this way?
In the next days I will test my direct register access by using the original "Arduino-Pointers", but I would feel better, if I understand the different addresses.
I've done the same to access multiple pins at once, just writing directly to the registers. For example to set PA4 to PA11 with one byte with my Arduino Zero compatible circuit, I use this function:
Depending on your application, you might need something like this, to avoid spikes when it first clears the register, by using the OUT register instead of OUTCLR and OUTSET. Untested:
Doesn't matter in my case, because I use it to feed 8 SPI chips in parallel and the data is only latched with changing clock, and 2 register writes are probably faster than a register read and the additional logical operations.
Returning the pointer to the data structure and printing the content showed the surprising result. The mask was okay, but the pointer has the unexpected content of 0x0800000.
To repeat the main problem: If I use the pointer value 0x41004400 from the SAMD21 data sheet, the Arduino Zero crashes (I have to make the double-click reset to enable the usage).
Yep (fixed!)
It was only meant as an example - the point was that instantiating clock and data would essentially cache the gAPinDescription[] lookups; something that "fast" bit-bang functions frequently do manually and less clearly.