Microcontroller I/O & ADC Benchmarks

Hi KatyaS.

Your confusion must be greater than you realise yourself.

STMxxxxxx are STMicro products.
TI has nothing to do with that.
Most Arduino products are based on Atmel ARM technology, but there are different tastes.
There are even Intel based products.

I can understand one would like a single IDE to be used for different products, but how smart would it be to do that ?
You still need to take the chips possibilities in consideration, and how well would an IDE create code if it has to work well with all types of chips ?

I don't know whether there are extensions so the Arduino IDE will work with STM products, but such extensions do exist (You already found out about ESP and such).

KatyaS:
You did tested for Nucleo STM32F401 board which I also intent to buy this one. But I still confuse that whether this board can be official Arduino supported (the official means that the Arduino core is developed by Arduino or TI)?

As MAS3 says TI do not do the STM32 series of MCU but it is done by STMicroelectronics.
Alas the STM32401 series of MCU is not well supported on the Arduino with a custom core so I wrote the benchmark above using it's native mbed core. There are several other variants of the STM32 that are better supported on the Arduino and I suggest you have a look here for more information on what cores are best supported. The STM32 board are usually incredibly cheap to buy and can offer a great performance boost compared to the AVR chips used in a lot of Arduinos.

The Arduino.org "Star Otto" board uses a STM32F469 - that should be similar to the STM32F401...
(exact status from an availability or "supported by working software" point of view is ... a bt murky.)

Put together admittedly terrible comparison graphs of the processor performance, definitely best viewed on a computer screen. Arduino Compatible Processor Comparison - Google Sheets

Managed to get my hands on a SAMD21 Zero clone and run the benchmark in #1 after converting references of Serial to SerialUSB and am surprised at how slow it is, especially the analogRead.
I have added results to the first post but for reference here, they are repeated.

Arduino Zero I/O Speed Tests Over 50000 Iterations. Compiled Using Arduino IDE v1.8.7
Digital Pin Write Takes About 1.6234 Microseconds.
Digital Pin Read  Takes About 1.0264 Microseconds.
Analogue Pin Read Takes About 423.2541 Microseconds.

it would be interesting to see some comparison between the new nanos and the ESP8266/ESP32...
is there something like this on the net, can't find it?

Have just done the test on an ESP32

ESP32 DoIt ESP32 Devkit V1 (80MHz) I/O Speed Tests Over 50000 Iterations. Compiled Using Arduino IDE v1.8.9 and 1.0.2 Core
Digital Pin Write Takes About 0.1199 Microseconds.
Digital Pin Read  Takes About 0.1642 Microseconds.
Analogue Pin Read Takes About 10.3027 Microseconds.

is there some benchtmark with the new arduinos?

sblantipodi:
is there some benchtmark with the new arduinos?

I don't have any of the newer Arduino's. Maybe someone who has can run the benchmark sketch from #4 and post the results here.

Riva:
While trying to determine the most suitable MCU for a project that needs fast analogue read I decided to knock up a quick bench-test sketch and run it on some of the various MCU's I have kicking around.
Hope the info is helpful and maybe others can add new MCU's or tests.

Interesting. Do you have any idea why the analog read is so slow for the Zero?

Do you have any idea why the analog read is so slow for the Zero?

Apparently the SAMD ADC is configured with a very large "sample time" in addition to the conversion time.
Arduino Zero ADC Sample Time Too Long · Issue #327 · arduino/ArduinoCore-samd · GitHub (reported nearly 2 years ago. Has the look of one of those "oops, we made a mistake, but we're afraid to change it for fear of breaking something" bugs :frowning: )
Adafruit improved it for their boards: Speed up ADC (especially for SAMD51!) · Issue #51 · adafruit/ArduinoCore-samd · GitHub

Teensy 4.0 at several speed settings (needed to change to nanoseconds)

Teensy 4.0 (600MHz) I/O Speed Tests Over 50000 Iterations.
Digital Pin Write Takes About 35.50 nanoseconds.
Digital Pin Read  Takes About 33.34 nanoseconds.
Analogue Pin Read Takes About 18623.44 nanoseconds.

Teensy 4.0 (150MHz) I/O Speed Tests Over 50000 Iterations.
Digital Pin Write Takes About 142.00 nanoseconds.
Digital Pin Read  Takes About 132.32 nanoseconds.
Analogue Pin Read Takes About 19351.30 nanoseconds.

Teensy 4.0 (24MHz) I/O Speed Tests Over 50000 Iterations.
Digital Pin Write Takes About 896.50 nanoseconds.
Digital Pin Read  Takes About 834.44 nanoseconds.
Analogue Pin Read Takes About 22173.00 nanoseconds.

STM32 Blue Pill (STM32F103C8T6/72 MHz) using Arduino IDE 1.8.2 and STM32 Core by STMMicroelectronics version 1.9.0:

STM32 Bluepill (STM32F103C8T6/72 MHz) I/O Speed Tests Over 50000 Iterations.
Digital Pin Write Takes About 0.50 Microseconds.
Digital Pin Read  Takes About 0.98 Microseconds.
Analogue Pin Read Takes About 63.30 Microseconds.

Using Arduino IDE 1.8.2 and STM32duino core:

STM32 "Blue Pill" (STM32F103C8T6 / 72 MHz) I/O Speed Tests Over 50000 Iterations.
Digital Pin Write Takes About 0.50 Microseconds.
Digital Pin Read  Takes About 0.78 Microseconds.
Analogue Pin Read Takes About 7.02 Microseconds.

STM8 Minimum Development Board (STM8S103F3P6/16 MHz) using Arduino IDE 1.8.2 and Sduino STM8 plain C core (non-C++) version 0.5.0

Generic STM8S103 breakout board (STM8S103F3P6/16 MHz) I/O Speed Tests Over 50000 Iterations.
Digital Pin Write Takes About 8.50 Microseconds.
Digital Pin Read  Takes About 8.52 Microseconds.
Analogue Pin Read Takes About 16.34 Microseconds.
[b]STM32 Core by ST[/b] version 1.9.0:  Analogue Pin Read Takes About [color=red][b]63.30[/b][/color] Microseconds.
[b]STM32duino core[/b]:                 Analogue Pin Read Takes About  [b][color=red]7.02[/color][/b] Microseconds.

Ouch!
I

@vishkas

Did you have something to add ?

if anyone is particularly interested:

AVR128DB32 @ 32 MHz
DxCore 1.3.6
No attempt made to correct for loop overhead
I/O Speed Tests Over 50000 Iterations.
digitalWriteFast with value known at compile-time takes about 0.09 us.
digitalWriteFast compile-time-unknown value takes about 0.22 us.
digitalWrite (assuming it is called multiple places with multiple pins) takes about 3.52 us.
digitalReadFast takes about 0.16 us.
digitalRead (assuming it is called somewhere else to prevent inlining) takes about 1.28 us.
analogRead by digital pin Takes About 22.03 us.
analogRead by analog channel takes about 21.82 us.
analogRead by channel with minimum sample time takes about 11.36 us, but will be inaccurate for high-impedance sources.
micros() takes about 3.51 us.
millis() takes about 0.75 us.
And the nonsense number we added up was 2231944390
AVR128DB32 @ 32 MHz
DxCore 1.3.6
Expected loop overhead is around 0.12us
This is accounted for in these numbers
I/O Speed Tests Over 50000 Iterations.
digitalWriteFast with value known at compile-time takes about 0.03 us.
digitalWriteFast compile-time-unknown value takes about 0.16 us.
digitalWrite (assuming it is called multiple places with multiple pins) takes about 3.46 us.
digitalReadFast takes about 0.03 us.
digitalRead (assuming it is called somewhere else to prevent inlining) takes about 1.16 us.
analogRead by digital pin Takes About 21.91 us.
analogRead by analog channel takes about 21.69 us.
analogRead by channel with minimum sample time takes about 11.24 us, but will be inaccurate for high-impedance sources.
micros() takes about 3.26 us.
millis() takes about 0.50 us.
And the nonsense number we added up was 1302684504
AVR128DB32 @ 24 MHz
DxCore 1.3.6
No attempt made to correct for loop overhead
I/O Speed Tests Over 50000 Iterations.
digitalWriteFast with value known at compile-time takes about 0.13 us.
digitalWriteFast compile-time-unknown value takes about 0.29 us.
digitalWrite (assuming it is called multiple places with multiple pins) takes about 4.70 us.
digitalReadFast takes about 0.21 us.
digitalRead (assuming it is called somewhere else to prevent inlining) takes about 1.71 us.
analogRead by digital pin Takes About 24.88 us.
analogRead by analog channel takes about 24.59 us.
analogRead by channel with minimum sample time takes about 12.88 us, but will be inaccurate for high-impedance sources.
micros() takes about 6.47 us.
millis() takes about 1.00 us.
And the nonsense number we added up was 1338021952
AVR128DB32 @ 24 MHz
DxCore 1.3.6
Expected loop overhead is around 0.17us
This is accounted for in these numbers
I/O Speed Tests Over 50000 Iterations.
digitalWriteFast with value known at compile-time takes about 0.04 us.
digitalWriteFast compile-time-unknown value takes about 0.21 us.
digitalWrite (assuming it is called multiple places with multiple pins) takes about 4.62 us.
digitalReadFast takes about 0.04 us.
digitalRead (assuming it is called somewhere else to prevent inlining) takes about 1.55 us.
analogRead by digital pin Takes About 24.71 us.
analogRead by analog channel takes about 24.42 us.
analogRead by channel with minimum sample time takes about 12.71 us, but will be inaccurate for high-impedance sources.
micros() takes about 6.14 us.
millis() takes about 0.67 us.
And the nonsense number we added up was 3853237457
AVR128DB32 @ 16 MHz
DxCore 1.3.6
No attempt made to correct for loop overhead
I/O Speed Tests Over 50000 Iterations.
digitalWriteFast with value known at compile-time takes about 0.19 us.
digitalWriteFast compile-time-unknown value takes about 0.44 us.
digitalWrite (assuming it is called multiple places with multiple pins) takes about 7.06 us.
digitalReadFast takes about 0.31 us.
digitalRead (assuming it is called somewhere else to prevent inlining) takes about 2.57 us.
analogRead by digital pin Takes About 23.44 us.
analogRead by analog channel takes about 23.01 us.
analogRead by channel with minimum sample time takes about 12.58 us, but will be inaccurate for high-impedance sources.
micros() takes about 6.71 us.
millis() takes about 1.51 us.
And the nonsense number we added up was 10272621
AVR128DB32 @ 16 MHz
DxCore 1.3.6
Expected loop overhead is around 0.25us
This is accounted for in these numbers
I/O Speed Tests Over 50000 Iterations.
digitalWriteFast with value known at compile-time takes about 0.06 us.
digitalWriteFast compile-time-unknown value takes about 0.31 us.
digitalWrite (assuming it is called multiple places with multiple pins) takes about 6.93 us.
digitalReadFast takes about 0.06 us.
digitalRead (assuming it is called somewhere else to prevent inlining) takes about 2.32 us.
analogRead by digital pin Takes About 23.19 us.
analogRead by analog channel takes about 22.76 us.
analogRead by channel with minimum sample time takes about 12.33 us, but will be inaccurate for high-impedance sources.
micros() takes about 6.21 us.
millis() takes about 1.01 us.
And the nonsense number we added up was 2744492523

A bunch of places I mention the problem of inlining and the subsequent optimizationm. Usually this is a good thing, but not so great in benchmarking:

AVR128DB32 @ 24 MHz
DxCore 1.3.6
digitalWrite on a constant pin known at compile-time, with all other calls to digitalWrite removed takes about 1.65 us instead of  instead of 4.70 us.
digitalWrite on a constant pin known at compile-time, with all other calls to digitalWrite removed takes about 1.59 us instead of 4.62 us. (corrected for overhead).
digitalRead of constant pin known at compiletime with all other calls to digitalRead removed takes about 0.42 us instead of 1.71 us.
digitalRead of constant pin known at compiletime with all other calls to digitalRead removed takes about 0.29 us instead of 1.55 us. (corrected for overhead).
(synthesized manually from several test runs, there's not a sketch to run here)

The big takeaway is that you want to use fast digital I/O if the pin numbers are constant and you care about digital I/O speed (in the sense that fast is desirable, as opposed to your on it being slow). If you DO depend on it being slow - try to move away from that (Really, you should never depend on assumptions about how long any API call other than delay() or delayMicroseconds() takes. The day may come when a core will automatically Fast-ify any call that has constant pin. It is trivial to do! My biggest reservation is not breaking bad code that relies on it being slow, but the poor visibility on when and where it will figure it out, such that what looks like a minor change could end up making a 2-order-of-magnitude difference in write speed. Currently, that only make a 3:1 difference in digitalWrite or 4:1 in digitalRead() (depending on inlining, as noted above), which is nasty, but 100:1.

Another non-negligible factor? The whole turning off of PWM pins. Some pins have more than others, and on DxCore, we do a bit more to look those up, since there, you can set the PORTMUX.TCAROUTEA registers to control which pins the the PWM generated with TCA0 and (for 48/64pin, TCA1). Even holding everthing else equal

AVR128DB32 @ 24 MHz
DxCore 1.3.6
No attempt made to correct for loop overhead
digitalWriteFast with value known at compile-time takes about 0.13 us.
digitalWriteFast compile-time-unknown value takes about 0.29 us.
digitalWrite on PA2 with just type A timer takes about  3.57 us.
digitalWrite on PD7 with no timers to turn off PWM from takes about 4.20 us.
digitalWrite on PA5 with 2 timers of which it will turn off one and only one of takes about 3.32 us.
digitalWrite on PA6 with just type D timer takes about 3.95 us.

Expected loop overhead is around 0.17us - This is accounted for in these numbers
digitalWrite on PA2 with just type A timer takes about 3.49 us.
digitalWrite on PD7 with no timers to turn off PWM from takes about 4.12 us.
digitalWrite on PA5 with 2 timers of which it will turn off one and only one of takes about 3.24 us.
digitalWrite on PA6 with just type D timer takes about 3.86 us.

megaTinyCore is faster because it doesn't support thhe weird PWM stuff
ATtiny3216 @ 16 MHz
megaTinyCore 2.3.2
No attempt made to correct for loop overhead
I/O Speed Tests Over 50000 Iterations.
digitalWriteFast with value known at compile-time takes about 0.19 us.
digitalWriteFast compile-time-unknown value takes about 0.44 us.
digitalWrite (assuming it is called multiple places with multiple pins) takes about 3.62 us.
digitalReadFast takes about 0.31 us.
digitalRead (assuming it is called somewhere else to prevent inlining) takes about 2.71 us.
analogRead by digital pin Takes About 33.26 us.
analogRead by analog channel takes about 32.60 us.
analogRead by channel with minimum sample time takes about 16.54 us, but will be inaccurate for high-impedance sources.
micros() takes about 6.80 us.
millis() takes about 1.51 us.
And the nonsense number we added up was 3184834827
ATtiny3216 @ 16 MHz
megaTinyCore 2.3.2
Expected loop overhead is around 0.25us
This is accounted for in these numbers
I/O Speed Tests Over 50000 Iterations.
digitalWriteFast with value known at compile-time takes about 0.06 us.
digitalWriteFast compile-time-unknown value takes about 0.32 us.
digitalWrite (assuming it is called multiple places with multiple pins) takes about 3.49 us.
digitalReadFast takes about 0.06 us.
digitalRead (assuming it is called somewhere else to prevent inlining) takes about 2.46 us.
analogRead by digital pin Takes About 33.01 us.
analogRead by analog channel takes about 32.35 us.
analogRead by channel with minimum sample time takes about 16.29 us, but will be inaccurate for high-impedance sources.
micros() takes about 6.30 us.
millis() takes about 1.01 us.
And the nonsense number we added up was 633926591
ATtiny3216 @ 20 MHz
megaTinyCore 
2.3.2
No attempt made to correct for loop overhead
I/O Speed Tests Over 50000 Iterations.
digitalWriteFast with value known at compile-time takes about 0.15 us.
digitalWriteFast compile-time-unknown value takes about 0.35 us.
digitalWrite on PA2 with no PWM takes about 2.90 us.
digitalWrite on PA4 with TCA0 timer 3.80 us.
digitalWrite on PC0 with TCD0 timer takes about 3.45 us.
digitalReadFast takes about 0.25 us.
digitalRead (assuming it is called somewhere else to prevent inlining) takes about 2.17 us.
analogRead by digital pin Takes About 26.61 us.
analogRead by analog channel takes about 26.07 us.
analogRead by channel with minimum sample time takes about 13.23 us, but will be inaccurate for high-impedance sources.
micros() takes about 6.27 us.
millis() takes about 1.21 us.
And the nonsense number we added up was 829034102
ATtiny3216 @ 20 MHz
megaTinyCore 
2.3.2
Expected loop overhead is around 0.20us
This is accounted for in these numbers
I/O Speed Tests Over 50000 Iterations.
digitalWriteFast with value known at compile-time takes about 0.05 us.
digitalWriteFast compile-time-unknown value takes about 0.25 us.
digitalWrite on PA2 with no PWM takes about 2.80 us.
digitalWrite on PA4 with TCA0 timer 3.70 us.
digitalWrite on PC0 with TCD0 timer takes about 3.35 us.
digitalReadFast takes about 0.05 us.
digitalRead (assuming it is called somewhere else to prevent inlining) takes about 1.97 us.
analogRead by digital pin Takes About 26.41 us.
analogRead by analog channel takes about 25.87 us.
analogRead by channel with minimum sample time takes about 13.03 us, but will be inaccurate for high-impedance sources.
micros() takes about 5.87 us.
millis() takes about 0.81 us.
And the nonsense number we added up was 3092560352
ATtiny1624 @ 20 MHz
megaTinyCore 
2.3.2
No attempt made to correct for loop overhead
I/O Speed Tests Over 50000 Iterations.
digitalWriteFast with value known at compile-time takes about 0.15 us.
digitalWriteFast compile-time-unknown value takes about 0.35 us.
digitalWrite (assuming it is called multiple places with multiple pins) takes about 2.88 us.
digitalReadFast takes about 0.25 us.
digitalRead (assuming it is called somewhere else to prevent inlining) takes about 2.16 us.
analogRead by digital pin Takes About 15.32 us.
analogRead by analog channel takes about 15.32 us.
analogRead by channel with minimum sample time takes about 9.36 us, but will be inaccurate for high-impedance sources.
micros() takes about 7.47 us.
millis() takes about 1.20 us.
And the nonsense number we added up was 1201978900
ATtiny1624 @ 20 MHz
megaTinyCore 
2.3.2
Expected loop overhead is around 0.20us
This is accounted for in these numbers
I/O Speed Tests Over 50000 Iterations.
digitalWriteFast with value known at compile-time takes about 0.05 us.
digitalWriteFast compile-time-unknown value takes about 0.25 us.
digitalWrite) (assuming it is called multiple places with multiple pins takes about 2.78 us.
digitalReadFast takes about 0.05 us.
digitalRead (assuming it is called somewhere else to prevent inlining) takes about 1.96 us.
analogRead by digital pin Takes About 15.12 us.
analogRead by analog channel takes about 15.12 us.
analogRead by channel with minimum sample time takes about 9.16 us, but will be inaccurate for high-impedance sources.
micros() takes about 7.07 us.
millis() takes about 0.80 us.
And the nonsense number we added up was 1036795895

One other thing people might be wondering about - the time taken by micros varies significanrtly depending on which timer is used and the clock speed.
TCB on a power-of-two number of MHz ought to be fastest, because the main mathematical operation involved is just bitshifts. For the others; we wish we could do division, but that is far slower, so we must content ourselves with addition and subtraction of the starting value, increasingly shifted right.
I expected TCD to be slower than it seems to be. I think other factors gum up the works enough for the others that it doesn't look as bad as I expected.
There is definitely room for someone with nothing better to do to implement pieces of micros in assembly. The compiler isn';t allowed to make the kind of assumoptions that we kjnow are valid based on our secret knowledge of what time is and how it works.
Benchmark_IO.ino (21.3 KB)

1 Like

That's "overclocking", right? Microchip says 24MHz top speed...
Does the internal oscillator run faster than 24MHz using the obvious "reserved" values? Or are you using an external clock?

Yah, that's overclocked. The internal runs at 28 or 32 MHz if you just increment the register. After that it repeats the 4 highest ones.
DA and DB both seem to handle it no problem, DB handles 32 MHz crystals as well, and even 40 MHz clock (at least I didn't see any anomalies); I also tried 48, and that started to misbehave it was getting wrong results from mathematical operations as overclocked AVRs tend to. Haven't tried 40 MHz crystal (generally clock works better for overclocking) or any other combinations not listed here, only recently got my DB-series breakout board with pads for crystal/clock in.

Update: With extended temperature range parts, 48 MHz external clock often works.....

I wish I knew what the "most sensitive" instructions were w/regard to overclocking so I could be more confident that it was working when I didn't see it doing obviously wrong things with randomly flipping bits in registers that happen to get printed, or the whole thing crashing.

Esp. with the tuning options for the tinyAVR parts being basically done (the oscillators there have HUGE range - 2=series at 20MHz OSCCFG gets up to like 36 before it stops working, and if the pattern was continued it would have been at like 38.x (this was implemented partly as practice for the rest of the tuning setup, since I do need to actually provide this on classic parts (where they actually do need it), while being generally easier because the parts are dead simple to work with; it looks like 24/25 is a shoo-in on all tinyAVR... 30 MHz maybe on 0/1 but asking a lot of the silicon, and 32 MHz will be universally achievable (stability? Who knows on that, but the clock should get up there no problem) on 2-series, which isnt the case for 0/1

0/1-series the oscillators have lower center frequencies, so you can tune even the 20 down below 12, but only up to 30-32 (and they don't run at that - though i have one that does seem to work at 32 external clock. Tuning internal on Dx isn't worth it, only 6 bits of calibration, covering only small departures from target frequency... and even for that the granularity is disappointing. Autotune would have been a lot nicer if it was choosing from a larger number of values so that like, you didn;t have to use freeze spray or a heat gun to change it's temperature enough that you can even determine if autotune is actually doing anything..... (the internal oscillator doesn't seem to care about the voltage - if you look at the power distribution section of datasheet, they show it being powered by an internal regulator..

This topic was automatically closed 120 days after the last reply. New replies are no longer allowed.