digitalWriteFast, digitalReadFast, pinModeFast etc

Paul's summary is right on and I apologize for putting off so many of the decisions.

If we decide to remove the PWM checks from digitalWrite(), I would suggest adding a noAnalogWrite(pin) function which stop the PWM generation on the pin.

The thought was that if this change were to happen, it would be best to do in Arduino 1.0. In which case, it seems annoying to change the performance of digitalWrite() twice in quick succession: once for 0019 (optimizing it as Paul has implemented but keeping the PWM check) and once for 1.0 (removing the PWM check). But maybe the performance benefits outweigh the pain of having to tweak your code twice?

I would suggest adding a noAnalogWrite(pin) function which stop the PWM generation on the pin

"noAnalogWrite" implies that analog writes will never be performed on that pin or will be permanently turned off. I suggest "stopAnalogWrite".

These really are difficult decisions.

While I've put a LOT of work into improving performance, really I do believe the value of this API is simplicity for novice users. Obviously Arduino has been very successful at presenting an I/O model novices can understand.

I don't do a lot of teaching, but I do get called upon when users have difficult problems, especially strange ones nobody else has managed to solve (I do have about 20 years experience designing and debugging embedded designs in assembly language). I've seen first hand where users are empowered by the simplicity these functions offer, but then get terribly frustrated when they fail to deliver that simplicity with those hardware quirks I mentioned above. I really think solving those problems, even if it means sacrificing performance or breaking backwards compatibility, will in the long run really benefit everyone.

Of course, I love my code to run fast, and when I truly care about performance, I just write in assembly. But I don't advocate teaching novices assembly, or even quirks of the hardware's register-level access. Hopefully those quirks that are currently exposed by the API can be addressed.

Adding all these parallel API functions seems a bit silly to me.

As has been mentioned before in these types of discussions, why can't the API be layered?
Rather than go off and invent an entire new slew of functions?

In other words for something like a digitalWrite().
digitalWrite() is the top layer and works just like it does today with all the handholding which does cost performance.
It does the timer code stuff and then calls _digitalWrite()

_digitalWrite() drops the next level and eliminates the checks and gets a bit faster and is really fast (single instruction) for constant arguments. If arguments are not constants it calls __digitalWrite()

__digitalWrite() is the bottom layer which gets called when the arguments are not constants.

That way existing code works just as it does today except much faster when constants are used. Users that are more knowledgeable and don't need any handholding can call _digitalWrite() to pick up additional performance especially if the arguments are constants to get single instruction bit sets/clears.

With a combination of macros and/or inline functions you can get everything with no additional overhead. And it is very consistent, backward compatible and easy to document since each layer takes the same arguments.

So my question is why go off and create all these parallel functions, when a simple more traditional layered API gets you there as well?

--- bill

I think that's more or less what is planned in Arduino 1.0. I agree that is a better approach. At the time I turned Paul's ideas into a header file, it was not clear when, if, or how it would be implemented in that way and this seemed a way to get all of that functionality and similar syntax right away.

I have written several libraries with optimized I/O for software SPI and I2C. Here is what I would like in fast digital read/write for this type of application.

The behavior of digitalRead/digitalWrite should not be modified to provide fast read/write. Fast read/write should be a separate facility.

Fast digital read/write should not clear PWM since this increases the execution time by five cycles on a PWM pin. Fast write will then executes in two cycles (more on some Mega pins) when both arguments are constant. A stopAnalogWrite() function should be added.

Fast digital read/write should fail at compile time if the pin number is not a constant or is too large. This will prevent unexpected behavior at execute time.

Here is an example include file for the 168/328 that shows how this can be implemented with static inline functions and a static const array. The array is always eliminated by compiler optimization.

It is easy to extend this idea to the Mega and other Arduino-like boards.

#ifndef DigitalFast_h
#define DigitalFast_h
#include <avr/io.h>

struct pin_map_t {
  volatile uint8_t* ddr;
  volatile uint8_t* pin;
  volatile uint8_t* port;
  uint8_t bit;
};

static const pin_map_t pinMap[] = {
  {&DDRD, &PIND, &PORTD, 0},  // D0  0
  {&DDRD, &PIND, &PORTD, 1},  // D1  1
  {&DDRD, &PIND, &PORTD, 2},  // D2  2
  {&DDRD, &PIND, &PORTD, 3},  // D3  3
  {&DDRD, &PIND, &PORTD, 4},  // D4  4
  {&DDRD, &PIND, &PORTD, 5},  // D5  5
  {&DDRD, &PIND, &PORTD, 6},  // D6  6
  {&DDRD, &PIND, &PORTD, 7},  // D7  7
  {&DDRB, &PINB, &PORTB, 0},  // B0  8
  {&DDRB, &PINB, &PORTB, 1},  // B1  9
  {&DDRB, &PINB, &PORTB, 2},  // B2 10
  {&DDRB, &PINB, &PORTB, 3},  // B3 11
  {&DDRB, &PINB, &PORTB, 4},  // B4 12
  {&DDRB, &PINB, &PORTB, 5},  // B5 13
  {&DDRC, &PINC, &PORTC, 0},  // C0 14
  {&DDRC, &PINC, &PORTC, 1},  // C1 15
  {&DDRC, &PINC, &PORTC, 2},  // C2 16
  {&DDRC, &PINC, &PORTC, 3},  // C3 17
  {&DDRC, &PINC, &PORTC, 4},  // C4 18
  {&DDRC, &PINC, &PORTC, 5}   // C5 19
};
static const uint8_t pinCount = sizeof(pinMap)/sizeof(pin_map_t);

static inline uint8_t badPinNumber(void)
 __attribute__((error("Pin number is too large or not a constant")));

static inline __attribute__((always_inline))
  uint8_t digitalReadFast(uint8_t pin) {
  if (__builtin_constant_p(pin) && pin < pinCount) {
    return (*pinMap[pin].pin >> pinMap[pin].bit) & 1;
  } else {
    return badPinNumber();
  }
}

static inline __attribute__((always_inline))
  void digitalWriteFast(uint8_t pin, uint8_t value) {
  if (__builtin_constant_p(pin) && pin < pinCount) {
    if (value) {
      *pinMap[pin].port |= 1 << pinMap[pin].bit;
    } else {
      *pinMap[pin].port &= ~(1 << pinMap[pin].bit);
    }
  } else {
    badPinNumber();
  }
}
#endif  // DigitalFast_h

Fast read/write should be a separate facility.

Why? If done "correctly", it should "just work" without the user having to do anything special.

There should be no reason that the user should have specify a separate "fast" api for use with constants to get faster/better code.
Redoing the existing API code implementation rather than creating a new API also allows all the existing code that uses constants to enjoy better performance by simply being recompiled with the new library code.

A separate "fast" API is merely pushing work onto the programmer that can easily be done by the compiler at compile time and creates future maintenance headaches.

It is not that difficult to wrap smart macros and inline functions around the existing API (not necessarily existing code) so that in the end the user gets better/faster code automagically if he uses constants and falls back to a slower implementation for non constants.
In fact there are ways to handle non constants that are faster and generate less code than the current code in the standard arduino library that are fully backward compatible with the current API.


On a side note, be very carful about ensuring atomic register access
because even when using constants, you don't always
get CBI/SBI instructions.
For example:

*regptr |= bitval;
*regptr &= ~bitval;

Does not always generate SBI/CBI single instructions.
Which is very important to ensure ATOMIC access.
Some AVRs have registers that are beyond the SBI/CBI range.
For those it is several instructions to set or clear a bit and nothing
can be done to reduce it.
But this too can be delt with to ensure ATOMICITY in smart macros & inline code such that
for those registers that are beyond the CBI/SBI range, additional code is inserted so that interrupts are masked and restored around the bit set/clear operation.

--- bill

There should be no reason that the user should have specify a separate "fast" api for use with constants to get faster/better code.

There are cases where changing the I/O performance or behavior (for example, not turning off PWM) could break legacy sketches.

I have seen sketches and libraries that rely on the relatively slow performance of the current digitalWrite implementation for providing timing delays in interface code. It's not good practice, but not point in penalizing users of that code.

Having a separate call for the faster functions would ensure that legacy code behavior would be unchanged. Users that needed faster performance with existing code could use the IDE to find and replace digitalWrite with the faster version.

There are cases where changing the I/O performance or behavior (for example, not turning off PWM) could break legacy sketches.

I have seen sketches and libraries that rely on the relatively slow performance of the current digitalWrite implementation for providing timing delays in interface code. It's not good practice, but not point in penalizing users of that code.[There are cases where changing the I/O performance or behaviour (for example, not turning off PWM) could break legacy sketches.

I have seen sketches and libraries that rely on the relatively slow performance of the current digitalWrite implementation for providing timing delays in interface code. It's not good practice, but not point in penalizing users of that code.

Yes I've also seen fragile/broken code that sometimes unknowingly takes advantage of certain code timings.
I'm all about attempting to preserve backward compatibility. But
developers that expect or need this are eventually going to get burned as eventually this simply can't always be done across all releases.

Trying to preserve that level of stuff is very difficult and essentually brings a toolset to a halt as any change will alter behaviors especially timing.

But even if this type of behaviour support is a requirement, then I think there are better ways to handle it than to pollute the API space with a flood of ever increasing new function calls.

A better way to handle this is to require those users that need to have this old behaviour preserved do something to request it, rather than require every one else to have to do something to request the better/new behaviour because as time goes by, those that write proper code are constantly being punished by having to update their code to use an ever changing set of API names.

So my strong suggestion would be not expand the APIs with new names for the same functions but simply write the new code such that it requires users that need to preserve an old behaviour set a #define at the top of their sketch or perhaps include some sort of deprecated or backward compatibility header file in order to get the previous existing behaviour that allows their broken code to continue to function in their specific environment.
That way, the existing API is preserved for everyone and only the users that have issues with the new improved version of the code have to do anything. All those that write proper code get the new enhancements "for free".

But also, as in my previous posts, I distinguish between timing and behaviour. So for different leaned out functionality I'd rather see a layered set of API calls, so that those that want the lean and mean versions of the functionality with no hand holding (which may be faster but is different from the current behaviour) could call the _ (underbar) version of the functions of the same functions.

--- bill

But developers that expect or need this ?

If Arduino was targeted to developers then I would agree with your points. However, most Arduino users are not software engineers.

Code that expects PWM to be turned off when digitalWrite is executed is not broken if that's the way it was intended to work.

I wouldn't have designed it that way, but that's the way it is.
IMO, changing the behavior, even if it's documented is likely to create problems that can be avoided by requiring the user to do something explicit to get the new improved behavior.

Changing digitalWrite will break a lot of code. The fastest form of this

  digitalWrite(13, HIGH);
  digitalWrite(13, LOW);

results in a 125 ns pulse on a 328.

The pulse is about 30 times longer with digitalWrite in Arduino 018.

The value of Arduino hardware/software is ease of use for the novice. Much of that is due to existing examples, articles, and books. It is not worth breaking these for a few developers.

I suspect many developers wouldn't use the fast version.

For example I wouldn't use it in my library for playing audio on the Adafruit Wave Shield. The hardware SPI is used to read from an SD card so it uses bit-bang SPI in a timer ISR to drive the DAC. It must send 88.4 KB/sec to the DAC for 44.2 ksps Wave files. I just want total control over optimizing this small but critical part of the library so I would not use a new fast digitalWrite.

I think you guys are missing my points.
And by the way when I say "developer" I mean anybody writing
Arduino code not necessarily what someone might call a true/real programmer.

The main point I'm saying is don't pollute the API name space with an entire set of new API functions.
Simply layer the API in 3 functionoal layers (which is what my original suggestion was) and tack on under bars "_" to get the names for the additional layers rather than dream up cutsie new names for functions.

Functionality of a given API call is independent of its timing. Often additional functionality impacts timing but this is not always the case.

Those that want to delve into higher performance alternatives to the existing functions which do not offer all the handholding should have a way to do that.
Those that want want the additional hand holding functionality of the currently defined API can still have it.
And those what want to ride any performance increases due to new and better library code being written without them having to modify any of their code should also be able to that as well.

Again, functionality of a library API call is independent of timing.
(well except for time delay API functions)
In some cases it is possible to preserve all the functionality yet make it much faster but in some cases it is not.

Consider the very recent Arduino library fix for atomic bit access. This is now in the mainline code and will be in the next release. This will slow down all digital pin operations.
What about that type of change? Is it also unacceptable to slow down an API call?

API functions should only offer to maintain the functionality defined, and if timing is not part of the API specification, then timing should not be a limiting factor in the implementation.

I also think that since it is possible to offer a higher performance interface with the very same API inteface and functionality by simply re-writing the underlying digital pin code, that the API library code should be allowed to be updated to take advantage of this rather than being bloated up with new API functions that are exactly same only with different timing.

And that users of Arduino need to understand that they cannot depend on certain timing behaviours being maintained across all releases. Those that need such exact timing behaviours of what they have in a particular release can simply stay with that given release and not upgrade.

This is no different than what currently happens in the commercial world on real products. Once you have a working product in the field you sometimes have to lock down toolsets. It is simply not always possible to upgrade to the latest greatest tools release even if you want to.

I suspect many developers wouldn't use the fast version.

For example I wouldn't use it in my library for playing audio on the Adafruit Wave Shield. The hardware SPI is used to read from an SD card so it uses bit-bang SPI in a timer ISR to drive the DAC. It must send 88.4 KB/sec to the DAC for 44.2 ksps Wave files. I just want total control over optimizing this small but critical part of the library so I would not use a new fast digitalWrite.

Now this depends on how it is presented.
If it were faster by default (preserving the same behaviour), then most people would use the "faster" i/o (maybe not the fastest as that might take using an _ (underbar) function with less functionality)
and those that couldn't handle "faster" would go in and update their
code (which might be as simple as a define or include) to revert back to the existing bloated code and slower behaviour - assuming there way to re-enable that code.

The point being nobody really knows what percentage of users actually need or depend on the current slow Arduino digitial pin code implementation. My belief is that this is a very small minority and the vast majority of the users would actually see little or no difference and a small fraction would be happy with the increased speed.

It seems so odd to be defending a poor code implementation that is slow and bloated.
Normally people are happy when new versions of code
could get faster or smaller.

--- bill

It seems silly to claim that _(under bar) makes it ok to triple the number of names when they imply different functionality/performance.

Does the under bar rule allow changes in functionality? If not, you are stuck with the PWM problem and a lot of new under bar names.

The point being nobody really knows what percentage of users actually need or depend on the current slow Arduino digitial pin code implementation.

I don't know this exact percentage, but I can tell you Teensy has used these optimizations for several months, with not a single case reported where faster digitalWrite() broke anything. (virtually all problems are hard-coded assumptions about the timers, usually which pins correspond to them) Of course, there are far fewer people using Teensy than Arduino. But many of the major Arduino libraries and lots of random code has been tested on Teensy over the last year. Teensy's far faster I/O simply has not been a compatibility issue.

The optimizations used in Teensy preserve the PWM disable functionality and aim to be exactly the same functionality as regular Arduino, only faster. No new naming was added.

Regarding API naming conventions, I want to remain neutral. My only comment here is that many people have now used these optimizations on Teensy, with no reported ill effects. Indeed some people have noticed and appreciated the faster speed, even though it's not really advertised or documented. Maybe I should do that??

I suspect Paul is right, few people will have problems with faster performance.

I did have a problem with a TC74 sensor when I optimized I2C bit-bang. This sensor is limited to 100 kHz I2C.

Still it is hard to believe that a factor of thirty in execution speed of basic I/O functions will not cause problems. That is why I designed my optimized functions to get a compile time error rather than revert to the slow functions.

I guess my experience in embedded control systems where predictable api execution time is important doesn't apply to the Arduino.

If Paul is correct then there should be only one digitalWrite and not several under bar variations.

Thinking about the ongoing discussion overnight:

the problem of testing against all the lilypad and arduino variants seems large. testing against all the libraries seems huge. testing against even a fraction of sketches seems insurmountable.

In terms of testing and reliability, it is one thing to write a library with a different name but quite another to rewrite wiring_digital and reuse the digitalWrite name.

I think that it is certainly not too soon for someone (bperrybap, fat16lib, your name in lights here,?) to write and post a variant of wiring_digital so that people can drop it in and we can get some community testing started. That is certainly a way to advance implementation of your ideas.

The plans for big changes in Arduino 1.0 have me a little concerned. it seems like various (worthwhile) API changes that will inevitably break existing sketches are all planned for that release. It would be good to have a long lead time on something as fundamental as a candidate for digitalWrite so that we know the new code's issues.

It might be good to get more specific direction from David Mellis before going too far, but I think it is not too soon to get a candidate posted.

I agree with jrraines, guidance from David Mellis is key. A vision/philosophy for the future of Arduino is needed.

There needs to be a way to add new/improved functionality but keep the core simple and stable.

For example, UNIX added asynchronous read with a new API, aio_read(), and didn't change read(). Some bad ideas have been removed from UNIX/Linux and some bad ideas live on but a better version of the API has been added.

I suspect the following would not be acceptable to the core Arduino developers but it illustrates why guidance is needed.

If I were allowed add a new API I would use an class with inline functions high() and low() that run 5-6 times faster than digitalWrite(). Here is a prototype sketch to illustrate the idea.

#include "pins_arduino.h"

class PinWriter {
  volatile uint8_t* _reg;
  uint8_t _bit;
 public:
  void high(void) {*_reg |= _bit;}
  void low(void) {*_reg &= ~_bit;} 
  void init(uint8_t pin) {
    // clear PWM timer bit here in real version
    pinMode(pin, OUTPUT);
    _bit = digitalPinToBitMask(pin);
    uint8_t port = digitalPinToPort(pin);
    _reg = portOutputRegister(port);
  }
};

PinWriter pin;

uint8_t pinNumber = 13;

void setup(void) {
  pin.init(pinNumber);
}

void loop(void) {
  // make four pulses for scope
  pin.high();
  pin.low();
  pin.high();
  pin.low();
  pin.high();
  pin.low();  
  pin.high();
  pin.low();
  delay(1);
}

This class has a fairly constant execution time for examples I have tried. About 12 cycles (750 ns) for high() and 13 cycles (810 ns) for low(). It does use a bit of RAM.

An optimized digitalWrite() facility for constant pin number would be available for advanced users the way aio_read() is available in UNIX.

To repeat, there are many possible choices for digitalWrite() so I think the principle Arduino developers need to provide guidance.

Please make sure any writing to pins is atomic. Interrupts must be disabled during read-modify-write operations. Please don't resurrect issue #146.

http://code.google.com/p/arduino/issues/detail?id=146

Paul I agree, the actual implementation should be atomic. I just lifted code from 0018 to illustrate that there is a wide range of choices for improving/replacing digitalWrite().

Even with the extra instructions, an atomic version of this class is over four times faster than an atomic version of digitalWrite().

I believe there needs to be a clear statement of what the goal/requirement is for an improved digitalWrite()/digitalRead(). This would provide some constrains on proposed implementations.

I believe you have a very good point.

However, as a practical matter, every API discussion I've been involved in or watched as it developed ultimately came down to an arbitrary decision by David Mellis, or the discussion faded away and nothing happened.

I really do not want to get involved in more API discussions.

David certainly wants to optimize digitalWrite, since he asked me to write this, which I did (Ben Combee deserves credit for originally suggesting using __built_constant_p), and David flagged the code in issue #140 as a milestone for Arduino 1.0.

i have opinions about the API, but ultimately they don't matter.