if anyone is particularly interested:
AVR128DB32 @ 32 MHz
DxCore 1.3.6
No attempt made to correct for loop overhead
I/O Speed Tests Over 50000 Iterations.
digitalWriteFast with value known at compile-time takes about 0.09 us.
digitalWriteFast compile-time-unknown value takes about 0.22 us.
digitalWrite (assuming it is called multiple places with multiple pins) takes about 3.52 us.
digitalReadFast takes about 0.16 us.
digitalRead (assuming it is called somewhere else to prevent inlining) takes about 1.28 us.
analogRead by digital pin Takes About 22.03 us.
analogRead by analog channel takes about 21.82 us.
analogRead by channel with minimum sample time takes about 11.36 us, but will be inaccurate for high-impedance sources.
micros() takes about 3.51 us.
millis() takes about 0.75 us.
And the nonsense number we added up was 2231944390
AVR128DB32 @ 32 MHz
DxCore 1.3.6
Expected loop overhead is around 0.12us
This is accounted for in these numbers
I/O Speed Tests Over 50000 Iterations.
digitalWriteFast with value known at compile-time takes about 0.03 us.
digitalWriteFast compile-time-unknown value takes about 0.16 us.
digitalWrite (assuming it is called multiple places with multiple pins) takes about 3.46 us.
digitalReadFast takes about 0.03 us.
digitalRead (assuming it is called somewhere else to prevent inlining) takes about 1.16 us.
analogRead by digital pin Takes About 21.91 us.
analogRead by analog channel takes about 21.69 us.
analogRead by channel with minimum sample time takes about 11.24 us, but will be inaccurate for high-impedance sources.
micros() takes about 3.26 us.
millis() takes about 0.50 us.
And the nonsense number we added up was 1302684504
AVR128DB32 @ 24 MHz
DxCore 1.3.6
No attempt made to correct for loop overhead
I/O Speed Tests Over 50000 Iterations.
digitalWriteFast with value known at compile-time takes about 0.13 us.
digitalWriteFast compile-time-unknown value takes about 0.29 us.
digitalWrite (assuming it is called multiple places with multiple pins) takes about 4.70 us.
digitalReadFast takes about 0.21 us.
digitalRead (assuming it is called somewhere else to prevent inlining) takes about 1.71 us.
analogRead by digital pin Takes About 24.88 us.
analogRead by analog channel takes about 24.59 us.
analogRead by channel with minimum sample time takes about 12.88 us, but will be inaccurate for high-impedance sources.
micros() takes about 6.47 us.
millis() takes about 1.00 us.
And the nonsense number we added up was 1338021952
AVR128DB32 @ 24 MHz
DxCore 1.3.6
Expected loop overhead is around 0.17us
This is accounted for in these numbers
I/O Speed Tests Over 50000 Iterations.
digitalWriteFast with value known at compile-time takes about 0.04 us.
digitalWriteFast compile-time-unknown value takes about 0.21 us.
digitalWrite (assuming it is called multiple places with multiple pins) takes about 4.62 us.
digitalReadFast takes about 0.04 us.
digitalRead (assuming it is called somewhere else to prevent inlining) takes about 1.55 us.
analogRead by digital pin Takes About 24.71 us.
analogRead by analog channel takes about 24.42 us.
analogRead by channel with minimum sample time takes about 12.71 us, but will be inaccurate for high-impedance sources.
micros() takes about 6.14 us.
millis() takes about 0.67 us.
And the nonsense number we added up was 3853237457
AVR128DB32 @ 16 MHz
DxCore 1.3.6
No attempt made to correct for loop overhead
I/O Speed Tests Over 50000 Iterations.
digitalWriteFast with value known at compile-time takes about 0.19 us.
digitalWriteFast compile-time-unknown value takes about 0.44 us.
digitalWrite (assuming it is called multiple places with multiple pins) takes about 7.06 us.
digitalReadFast takes about 0.31 us.
digitalRead (assuming it is called somewhere else to prevent inlining) takes about 2.57 us.
analogRead by digital pin Takes About 23.44 us.
analogRead by analog channel takes about 23.01 us.
analogRead by channel with minimum sample time takes about 12.58 us, but will be inaccurate for high-impedance sources.
micros() takes about 6.71 us.
millis() takes about 1.51 us.
And the nonsense number we added up was 10272621
AVR128DB32 @ 16 MHz
DxCore 1.3.6
Expected loop overhead is around 0.25us
This is accounted for in these numbers
I/O Speed Tests Over 50000 Iterations.
digitalWriteFast with value known at compile-time takes about 0.06 us.
digitalWriteFast compile-time-unknown value takes about 0.31 us.
digitalWrite (assuming it is called multiple places with multiple pins) takes about 6.93 us.
digitalReadFast takes about 0.06 us.
digitalRead (assuming it is called somewhere else to prevent inlining) takes about 2.32 us.
analogRead by digital pin Takes About 23.19 us.
analogRead by analog channel takes about 22.76 us.
analogRead by channel with minimum sample time takes about 12.33 us, but will be inaccurate for high-impedance sources.
micros() takes about 6.21 us.
millis() takes about 1.01 us.
And the nonsense number we added up was 2744492523
A bunch of places I mention the problem of inlining and the subsequent optimizationm. Usually this is a good thing, but not so great in benchmarking:
AVR128DB32 @ 24 MHz
DxCore 1.3.6
digitalWrite on a constant pin known at compile-time, with all other calls to digitalWrite removed takes about 1.65 us instead of instead of 4.70 us.
digitalWrite on a constant pin known at compile-time, with all other calls to digitalWrite removed takes about 1.59 us instead of 4.62 us. (corrected for overhead).
digitalRead of constant pin known at compiletime with all other calls to digitalRead removed takes about 0.42 us instead of 1.71 us.
digitalRead of constant pin known at compiletime with all other calls to digitalRead removed takes about 0.29 us instead of 1.55 us. (corrected for overhead).
(synthesized manually from several test runs, there's not a sketch to run here)
The big takeaway is that you want to use fast digital I/O if the pin numbers are constant and you care about digital I/O speed (in the sense that fast is desirable, as opposed to your on it being slow). If you DO depend on it being slow - try to move away from that (Really, you should never depend on assumptions about how long any API call other than delay() or delayMicroseconds() takes. The day may come when a core will automatically Fast-ify any call that has constant pin. It is trivial to do! My biggest reservation is not breaking bad code that relies on it being slow, but the poor visibility on when and where it will figure it out, such that what looks like a minor change could end up making a 2-order-of-magnitude difference in write speed. Currently, that only make a 3:1 difference in digitalWrite or 4:1 in digitalRead() (depending on inlining, as noted above), which is nasty, but 100:1.
Another non-negligible factor? The whole turning off of PWM pins. Some pins have more than others, and on DxCore, we do a bit more to look those up, since there, you can set the PORTMUX.TCAROUTEA registers to control which pins the the PWM generated with TCA0 and (for 48/64pin, TCA1). Even holding everthing else equal
AVR128DB32 @ 24 MHz
DxCore 1.3.6
No attempt made to correct for loop overhead
digitalWriteFast with value known at compile-time takes about 0.13 us.
digitalWriteFast compile-time-unknown value takes about 0.29 us.
digitalWrite on PA2 with just type A timer takes about 3.57 us.
digitalWrite on PD7 with no timers to turn off PWM from takes about 4.20 us.
digitalWrite on PA5 with 2 timers of which it will turn off one and only one of takes about 3.32 us.
digitalWrite on PA6 with just type D timer takes about 3.95 us.
Expected loop overhead is around 0.17us - This is accounted for in these numbers
digitalWrite on PA2 with just type A timer takes about 3.49 us.
digitalWrite on PD7 with no timers to turn off PWM from takes about 4.12 us.
digitalWrite on PA5 with 2 timers of which it will turn off one and only one of takes about 3.24 us.
digitalWrite on PA6 with just type D timer takes about 3.86 us.
megaTinyCore is faster because it doesn't support thhe weird PWM stuff
ATtiny3216 @ 16 MHz
megaTinyCore 2.3.2
No attempt made to correct for loop overhead
I/O Speed Tests Over 50000 Iterations.
digitalWriteFast with value known at compile-time takes about 0.19 us.
digitalWriteFast compile-time-unknown value takes about 0.44 us.
digitalWrite (assuming it is called multiple places with multiple pins) takes about 3.62 us.
digitalReadFast takes about 0.31 us.
digitalRead (assuming it is called somewhere else to prevent inlining) takes about 2.71 us.
analogRead by digital pin Takes About 33.26 us.
analogRead by analog channel takes about 32.60 us.
analogRead by channel with minimum sample time takes about 16.54 us, but will be inaccurate for high-impedance sources.
micros() takes about 6.80 us.
millis() takes about 1.51 us.
And the nonsense number we added up was 3184834827
ATtiny3216 @ 16 MHz
megaTinyCore 2.3.2
Expected loop overhead is around 0.25us
This is accounted for in these numbers
I/O Speed Tests Over 50000 Iterations.
digitalWriteFast with value known at compile-time takes about 0.06 us.
digitalWriteFast compile-time-unknown value takes about 0.32 us.
digitalWrite (assuming it is called multiple places with multiple pins) takes about 3.49 us.
digitalReadFast takes about 0.06 us.
digitalRead (assuming it is called somewhere else to prevent inlining) takes about 2.46 us.
analogRead by digital pin Takes About 33.01 us.
analogRead by analog channel takes about 32.35 us.
analogRead by channel with minimum sample time takes about 16.29 us, but will be inaccurate for high-impedance sources.
micros() takes about 6.30 us.
millis() takes about 1.01 us.
And the nonsense number we added up was 633926591
ATtiny3216 @ 20 MHz
megaTinyCore
2.3.2
No attempt made to correct for loop overhead
I/O Speed Tests Over 50000 Iterations.
digitalWriteFast with value known at compile-time takes about 0.15 us.
digitalWriteFast compile-time-unknown value takes about 0.35 us.
digitalWrite on PA2 with no PWM takes about 2.90 us.
digitalWrite on PA4 with TCA0 timer 3.80 us.
digitalWrite on PC0 with TCD0 timer takes about 3.45 us.
digitalReadFast takes about 0.25 us.
digitalRead (assuming it is called somewhere else to prevent inlining) takes about 2.17 us.
analogRead by digital pin Takes About 26.61 us.
analogRead by analog channel takes about 26.07 us.
analogRead by channel with minimum sample time takes about 13.23 us, but will be inaccurate for high-impedance sources.
micros() takes about 6.27 us.
millis() takes about 1.21 us.
And the nonsense number we added up was 829034102
ATtiny3216 @ 20 MHz
megaTinyCore
2.3.2
Expected loop overhead is around 0.20us
This is accounted for in these numbers
I/O Speed Tests Over 50000 Iterations.
digitalWriteFast with value known at compile-time takes about 0.05 us.
digitalWriteFast compile-time-unknown value takes about 0.25 us.
digitalWrite on PA2 with no PWM takes about 2.80 us.
digitalWrite on PA4 with TCA0 timer 3.70 us.
digitalWrite on PC0 with TCD0 timer takes about 3.35 us.
digitalReadFast takes about 0.05 us.
digitalRead (assuming it is called somewhere else to prevent inlining) takes about 1.97 us.
analogRead by digital pin Takes About 26.41 us.
analogRead by analog channel takes about 25.87 us.
analogRead by channel with minimum sample time takes about 13.03 us, but will be inaccurate for high-impedance sources.
micros() takes about 5.87 us.
millis() takes about 0.81 us.
And the nonsense number we added up was 3092560352
ATtiny1624 @ 20 MHz
megaTinyCore
2.3.2
No attempt made to correct for loop overhead
I/O Speed Tests Over 50000 Iterations.
digitalWriteFast with value known at compile-time takes about 0.15 us.
digitalWriteFast compile-time-unknown value takes about 0.35 us.
digitalWrite (assuming it is called multiple places with multiple pins) takes about 2.88 us.
digitalReadFast takes about 0.25 us.
digitalRead (assuming it is called somewhere else to prevent inlining) takes about 2.16 us.
analogRead by digital pin Takes About 15.32 us.
analogRead by analog channel takes about 15.32 us.
analogRead by channel with minimum sample time takes about 9.36 us, but will be inaccurate for high-impedance sources.
micros() takes about 7.47 us.
millis() takes about 1.20 us.
And the nonsense number we added up was 1201978900
ATtiny1624 @ 20 MHz
megaTinyCore
2.3.2
Expected loop overhead is around 0.20us
This is accounted for in these numbers
I/O Speed Tests Over 50000 Iterations.
digitalWriteFast with value known at compile-time takes about 0.05 us.
digitalWriteFast compile-time-unknown value takes about 0.25 us.
digitalWrite) (assuming it is called multiple places with multiple pins takes about 2.78 us.
digitalReadFast takes about 0.05 us.
digitalRead (assuming it is called somewhere else to prevent inlining) takes about 1.96 us.
analogRead by digital pin Takes About 15.12 us.
analogRead by analog channel takes about 15.12 us.
analogRead by channel with minimum sample time takes about 9.16 us, but will be inaccurate for high-impedance sources.
micros() takes about 7.07 us.
millis() takes about 0.80 us.
And the nonsense number we added up was 1036795895
One other thing people might be wondering about - the time taken by micros varies significanrtly depending on which timer is used and the clock speed.
TCB on a power-of-two number of MHz ought to be fastest, because the main mathematical operation involved is just bitshifts. For the others; we wish we could do division, but that is far slower, so we must content ourselves with addition and subtraction of the starting value, increasingly shifted right.
I expected TCD to be slower than it seems to be. I think other factors gum up the works enough for the others that it doesn't look as bad as I expected.
There is definitely room for someone with nothing better to do to implement pieces of micros in assembly. The compiler isn';t allowed to make the kind of assumoptions that we kjnow are valid based on our secret knowledge of what time is and how it works.
Benchmark_IO.ino (21.3 KB)