Go Down

Topic: in search of the perfect digitalWrite() (Read 2955 times) previous topic - next topic

ralphd

I don't use digitalWrite in my code because of how bloated and inefficient it is.  But I like the idea behind the abstraction of physical pins to port registers, so improving the wiring code has been in the back of my mind for a while.

I was able to make a small improvement by defining separate LOW and HIGH types, then overloading digitalWrite:
digitalWrite (byte pin, LOW)
digitalWrite (byte pin, HIGH)

The hard part is moving the pin mapping to compile time instead of runtime.  I believe it is possible with C++, and after seeing some of the cool optimizations done in the Arduino tiny cores using templates, I *think* it may require the use of templates.
What I want to write would be something like this:
digitalWrite (0, HIGH) { PORTD |= (1<<PD0);}
digitalWrite (1, HIGH) { PORTD |= (1<<PD1);}
...
digitalWrite (8, HIGH) { PORTB |= (1<<PB0);}

On an AVR PORTD |= (1<<PD0) complies to a single instruction: sbi PORTD, 0
So getting this idea to work would mean the speed and code size benefits of direct port manipulation with portability and abstraction benefits of digitalWrite.  Any C++ wizards out there know how to do this?

MarkT

[ I will NOT respond to personal messages, I WILL delete them, use the forum please ]

ralphd


Seen this thread? http://forum.arduino.cc/index.php/topic,46896.0.html


Yes, and I've looked at the arduino-lite core:
https://code.google.com/p/arduino-lite/

The problem is they both require changing existing code.  I think it is possible to optimize digitalWrite() so that existing libraries don't have to be changed.  I think it may even be possible to optimize some digitalWrite calls where the pin is a variable.
For example
class FooWriter
{
  byte outputPin;
  void transmit(byte data) {
    digitalWrite(outputPin, HIGH);
    ...
   }
}
...
  FooWriter fw(3); // output on pin 3
  fw.transmit(42);

In the above code, it is known at compile time that digitalWrite will be called with 3 as the first parameter.  So I think it may be possible for the compiler to optimize the digitalWrite to the single sbi instruction.
But I don't understand C++ well enough to know if it can or not.  Recognizing that outputPin doesn't change at runtime and therefore optimizing it out would mean sizeof(FooWriter) is 0 instead of 1, so maybe the standard doesn't allow that kind of compile-time optimization.

pito

SdFat lib includes DigitalIO.h which compiles in single instruction:
Code: [Select]
#include <DigitalIO.h>
DigitalPin<11> NRD(OUTPUT);    // a pin named NRD
void setup()
{
NRD = 1;
..



ralphd


SdFat lib includes DigitalIO.h which compiles in single instruction:
Code: [Select]
#include <DigitalIO.h>
DigitalPin<11> NRD(OUTPUT);    // a pin named NRD
void setup()
{
NRD = 1;
..



I'll check out the code to see how it is done. It looks like DigitalPin is a template class with an assignment operator defined, which I don't think is applicable to what I want to do with digitalWrite.


ralphd

I found someone that compared the Arduino version of digitalWrite to the Wiring version:
http://www.codeproject.com/Articles/589299/Why-is-the-digital-I-O-in-Arduino-slow-and-what-ca

bperrybap

One issue with depending on the avr-gcc sbi/cbi optimization
that some folks may not be aware of is that
*reg |= mask;
*reg &= mask;
Is not guaranteed to generate a sbi/cbi instruction.
Here is a link that talks about the issue in detail.
http://forum.arduino.cc/index.php?topic=211415.msg1553690#msg1553690
The summary is that that on some processors the avr-gcc optimization hack
that converts |= and &= to sbi/cbi instruction fails because the register's
address is too large. In those cases the resulting code does not
update the register atomically and can cause register corruption
if the same port register is used in foreground and ISR routines.
The net result is that optimization silently fails for some of the AVR registers,
so if you are depending on it, you have to be very careful.

In terms of "arduino-like" "fast" AVR bit i/o APIs,
so far from what I've seen, none of these "fast" options solve the next level problem
and that is multi pin i/o for things like byte operations.

I did an implmentation that also provides multi bit i/o that I use in my openGLCD library.
It is licensed as GPL v3 code and can found here in my mcu-io project:
http://code.google.com/p/mcu-io
See the avrio code and download.

It provides an arduino like interface that will crush down
even multiple bit i/o when possible.
It also allows you to specify pins using the AVR PORT and bit number
rather than arduino pin numbers.
(Arduino raw pin numbers can be used but requires creating a pin mapping macro)

Right now it works for single pin and 8 pin i/o.
If there is an interest I could put in 4 pin support.

Another option, while not quite as fast is to avoid using the digitalWrite()/digitalRead() interface
all together and use indirect port i/o. This is portable across all processors and board types used on Arduino
and allows using raw ardiuno pin numbers.
What this does shift the run time penalty to only once during initalization rather than
on each and every single i/o.
To use this, the code fetches and saves the register pointers and bit masks up front using:
address:
Code: [Select]
reg = portOutputRegister(digitalPinToPort(pin));
mask:
Code: [Select]
mask = digitalPinToBitMask(pin);
You save them away and then later can do:
Code: [Select]
reg |= mask;
reg &= ~mask;

The restriction is that you must mask interrupts to ensure atomicity.
While not as fast direct raw port i/o it is much faster than the Arduino core code routines.
This method is quite effective for libraries and several out there are doing this.
They can get a substantialy bump in performance and yet remain portable across boards & processors.
(Well it is currently broken on DUE, but that is a Arduino team issue in the DUE code,
I entered a bug report for it)


Another simpler alternative for single pin i/o is to just switch to using Paul's Teensy boards.
The teensy core code used when using one of his boards
will optimize automagically to use port i/o when possible
without having to do anything special to your code.
When using a Teensy board, the digitalWrite()/digitalRead() code just magically much faster
if you use constants as the parameters.

There simply is no good excuse as to why the Arduino teams hasn't updated
the standard Arduino AVR core code to provided faster i/o when there are alternatives
that are much faster and yet preserve 100% of the existing API.

What is really needed is to abandon the digitalWrite()/digitalRead() API and
define a new one.
One that uses a SET and CLR semantic reather than a set to a value semantic.
This would allow using the better hardware capabilities available in other non AVR
processors like the pic32.
As-is, the better hardware is eternally limited and dramatically slowed down
by having to maintain the existing Arduino API.

--- bill

CrossRoads

I regularly just use Direct Port Manipulation to set outputs High & Low, especially when sending things via SPI, or when reading/writing multiple pins.
Some will say "oooh, not portable".
What do I care? I am writing for a specific processor for a specific application, and if I want fast performance I write for fast.
If just lighting up some LEDs for a clock or an indicator or something, digitalWrite is fine.

I haven't had time to create a cheat sheet of all the other methods out there, many which seem perfectly legit. Nor do I do so much coding that I remember all the stuff.
Your mileage may vary of course.
Designing & building electrical circuits for over 25 years.  Screw Shield for Mega/Due/Uno,  Bobuino with ATMega1284P, & other '328P & '1284P creations & offerings at  my website.

ralphd


What is really needed is to abandon the digitalWrite()/digitalRead() API and
define a new one.

That would be nice, but lots of the existing libraries will be left using digitalWrite/digitalRead.  One example is the mirf libraries.  When I was working to get nrf modules working using only 3 pins on an ATtiny85, I noticed some of the library code used digitalWrite() (one place was for controlling CSN).  Removing it saved several uS for each communication with the nrf, and reduced the code size by over 100 bytes.

ralphd


I regularly just use Direct Port Manipulation to set outputs High & Low, especially when sending things via SPI, or when reading/writing multiple pins.

Your sentiments could explain why digitalWrite hasn't been improved - anyone who cares about code size and performance just uses direct port manipulation.  And for things where performance counts (like SPI or USART), there's no pins to map anyway - mosi is always PB3 (digital pin 11) and txd is always PD1 (digital pin 1)

bperrybap



What is really needed is to abandon the digitalWrite()/digitalRead() API and
define a new one.

That would be nice, but lots of the existing libraries will be left using digitalWrite/digitalRead. 

I think you missed my overall point.
I'm not suggesting doing the draconian things that the Arduino team has done of
changing APIs and breaking the world.
What I'm suggesting is a new additional API based on set & clear.
The old existing api would be layered on top of that as a
fully functional but deprecated compatibility API.

The key thing to remember is that the existing digitalRead()/digitalWrite() API and
its semantics can easily be layered on top of other much more efficient APIs.
The reverse is not true.

You can layer a new very slim fully compatible digitalWrite()/digitalRead() API layer
on top of set/clear API but you can't get a set/clear interface to the hardware to work on top of the
existing digitalWrite()/digitalRead() API functions.

i.e. you want the most efficient interface at the bottom closest to the h/w
and then build more and more convience layers on top of that.

--- bill

ralphd




What is really needed is to abandon the digitalWrite()/digitalRead() API and
define a new one.

That would be nice, but lots of the existing libraries will be left using digitalWrite/digitalRead. 

I think you missed my overall point.

I think I follow you now.  You're suggesting there is no perfect digitalWrite/digitalRead.  So instead make the perfect digitalSet/digitalClear, and write an improved digitalWrite/digitalRead that use the new API.

bperrybap



I regularly just use Direct Port Manipulation to set outputs High & Low, especially when sending things via SPI, or when reading/writing multiple pins.

Your sentiments could explain why digitalWrite hasn't been improved - anyone who cares about code size and performance just uses direct port manipulation.

I would disagree with this.
A big issue with AVR direct port i/o is that when combined with the existing Arduino APIs,
it becomes very difficult to allow a sketch to configure the desired pins.
This is a big issue for Arduino library writers that cannot assume a hard coded/closed enviorment.
When performance is wanted/needed,
Arduino Library writers tend to take 3 approaches:
- Just hard code it for direct port i/o and force the user to deal with it.
- provide a header file where the user can hard the pins for the library
- use indirect port i/o, and allow the user to configure pins in his sketch

The next best thing to direct port i/o is indirect port i/o.
While not as fast as direct port i/o it is much faster than the digitalWrite()/digitalRead() API functions.
There are many examples of libraries out there that are currently using indirect port i/o
to pick up most of the speed of direct port i/o while still allowing the user the
abililty to configure the pins in his sketch on a sketch by sketch basis.
It is about the best that can be done that allows the user to set the pins in the sketch.
It could be better on certain hardware if there was a set/clear interface.

My avrio routines do all the direct port i/o magic including multi bit i/o.
This allows optimal perfermance when possible and then reductions
in perfomance if a differnent pin configuration is needed/desired.
Multi bit i/o is big deal when dealing with 8 bit parallel interface
like a glcd.

So "better", "best",  "perfect" solutions somewhat comes down to your
frame of reference and your restrictions.
If you are a lone wolf just writing for your self, there are many solutions
that are possible that may not be appropriate to the library author
that is trying to create a solution that works on a variety of platforms.

--- bill

bperrybap





What is really needed is to abandon the digitalWrite()/digitalRead() API and
define a new one.

That would be nice, but lots of the existing libraries will be left using digitalWrite/digitalRead.

I think you missed my overall point.

I think I follow you now.  You're suggesting there is no perfect digitalWrite/digitalRead.  So instead make the perfect digitalSet/digitalClear, and write an improved digitalWrite/digitalRead that use the new API.


Exactly.
Sorry, I guess I was a bit vague initially.

Other processers like the pic32 have set and clear registers rather than depending on
set and clear bit instructions.
The big advantages is that you can use indirect port i/o and it is atomic.
Also, you can set/clear multiple bits at once.
With the current digitalRead()/digitalWrite() interface it impossible to take advantage
of the better hardware and the better hardware is reduced to using the non atomic
port registers which requires having to mask/unmask interrupts to ensure
atomicity just like on the AVR.

I also think that the API could use a multi-pin interface.
This would allow users to set multiple pins at once.
So it becomes a single API call to set 8 pins for a byte
or 4 pins for a nibble.



--- bill


dolinay

I found this discussion only now, but maybe it's still not too late to add something...

I think I have a solution which is about as fast as you can get without changing the API.
For pin numbers known at compile time it results in single instruction (or few instructions if interrupts need to be disabled).
For pins stored in a variable it takes about half of the time of standard digitalWrite.
And the implementation is easily portable.

I wrote an article on Codeproject about it: http://www.codeproject.com/Articles/732646/Fast-digital-I-O-for-Arduino

The trick is in providing the I/O register address and bit mask to digitalRead and digitalWrite as input parameter rather than computing it in runtime from pin number.
It would also allow easy manipulation of multiple pins (of the same port) at once - the pin mask can contain more than one bit.

About the C++ templates, as far as I understand C++, this solves only the case when pin number is known at compile time. There are some situations when the pin needs to be stored in a variable and here the templates have no effect. This approach will probably end up with 2 different classes - one for constant pins and one for variables - like in this implementation http://forum.arduino.cc/index.php/topic,86931.0.html. So it may help if you want to treat pins in a C++ way, like objects, but it will not do any magic trick about the speed.




Go Up