digitalWriteFast, digitalReadFast, pinModeFast etc

Thanks for your reply, jrraines!

Also, only now, after looking in more detail at DigitalWriteFast (I also hadn't seen the excellent and concise description at Google Code Archive - Long-term storage for Google Code Project Hosting. yet), I understand how foolish my question was. DigitalWriteFast requires pin numbers to be known at compile time. Period.
i.e. digitalWrite(9, HIGH) can be sped up, digitalWrite(i, HIGH) can't.

Likely this will be obvious right away to most readers of this topic, but to me, as a novice, it wasn't.

Thanks for your time! :slight_smile:

Westfw somewhere remarked that consistent speed of digitalWrite may be desirable. Another consideration.

why it is that revised functionality with identical behavior as before (thus no API change) yet improved performance are not getting committed?

Part of the problem is in EXPLAINING the new functions. None of the Arduino functions currently include information about execution speed, and for most applications it isn't important. Far slower systems have existed and solved problems. Now suddenly we want to add "fast" versions, and the possibility that they will introduce confusion is rather high. "when do I need the fast function? Do I need fastSerial.Print too? fastAnalogRead? I changed blink to use the fast functions and it's still blinking once per second?"

DigitalWriteFast requires pin numbers to be known at compile time. Period.
i.e. digitalWrite(9, HIGH) can be sped up, digitalWrite(i, HIGH) can't.

Those are two different statements. DigitalWriteFast is quite careful to "work" with pin numbers that are variables. It just won't be any faster. (adds more confusion, you know.
"Isn't the pin number a constant in the blink example? How come sometimes it runs fast and sometimes it runs slowly?")

Fully agreed.

Your explanations are far more exact and unambiguous. That's why I'm a novice and you're an expert :slight_smile:

Have you tested digitalWriteFast(i++); (assuming i is a defined variable) ?

I don't think I'd used that specific syntax. I just ran a simple example and it seems to give the correct result. Have you had a problem?

uint8_t i=18;
digitalWriteFast(i++,HIGH);
lcd(0,0)<<"19 = "<<(long)i;

Part of the problem is in EXPLAINING the new functions.

By that logic, no improvements would ever be added.

David Mellis specifically requested this code, and when I wrote it for normal Arduino boards, he specifically requested it implemented for the Arduino Mega, which I also did within a matter of a couple days. Difficulty of documentation was never a concern when he (and others on the developer list) wanted this, back in November 2009.

Since then, it's sat in the issue tracker for about half a year. However, he did recently flag this for the 1.0 milestone, so maybe it'll actually make it into the official Arduino core within the next 6 months?

Still not known is if this code will simply be used as digitalWrite(), or if a new name like digitalWriteFast() will be used, or if David will end up implementing it some other way. However it David ends up using this, assuming he ever does, I'm sure once it's actually committed to svn and due to be released, somehow explaining/documenting it really won't be a big deal, and if it doesn't introduce a new name, perhaps no documentation changes will be needed at all?

I've heard this "but we'd have to document it and support it" line many, many times before, usually in the corporate world by mid level managers who just don't want to do anything innovative, unless the directive comes from those above them. There is some point to it for dramatically new products, but really, in cases like this where the feature is just a performance improvement that carries virtually no risk, virtually no backwards compatibility, and is pretty much just invisible, I just don't buy that line about how difficult documentation is. It'd probably take less time than we've sent writing all these message in this thread!

I will take some of the responsibility for all this 'difficulty explaining' discussion; I think my writeup was not well done. That was partly because I'd understood the remarks from others who wondered 'why bother when using PORT etc directly could give better efficiency at run time'.

Trying to acknowledge the truth of that but point out the simplicity and ease of use of using this led to something even more circuitous than this post.

To slightly change the subject, the other thing I'd have thought would warranted prompt adoption into the core is Streaming.h--I'd have thought it should just have been added to print.h. But both David Mellis and Mikal Hart seemed in agreement that it made sense to leave it out. And I will say that leaving it out led to further improvements--Mikal Hart changing the crux of it from 7 lines to one amazingly functional line of code and Michael Margolis and others adding support for HEX etc. That development might have been impeded if it were in the core.

So there are concerns about who maintains, extends and can commit code. With digitalWriteFast there is the concern about who will revise the macros when a board more complex than the Mega comes out. When that happens I will certainly work on it; I would expect it would stretch my ability to write macros quite considerably. As I have acknowledged before I would not have gotten my small contribution to this done without building on Paul's work and without Westfw's patient guidance.

Speaking of my limited abilities with macros, Paul pointed outthe issue with pin numbers that are too high a while back. There is a #error "Your error message here." macro. I can make it work with #ifdef, but I spent an hour or two playing with how to make it work for a numeric issue and had no success. Any tips would be appreciated.

The reason I've been holding off on including this is the need for deciding whether it's an optimization to the existing digitalWrite() or an additional function. This seemed like something that made sense to target for the 1.0 release, but probably could have been (could be) resolved sooner.

In particular, I think we should simply remove the checking for and disabling of PWM output from the digitalWrite() function and use this optimized version. Then we will have, if I understand it, a one-instruction digitalWrite() for cases where the pin number and value are a compile-time constant. Anything slower than that seems like it might suggest an alternative (faster) option, which I think would be an unnecessary complication.

Mellis, I don't think removing the PWM check from the current digitalWrite will provide anything close to the performance of Paul's one-instruction write macro.

Removing the code to check and disable PWM would speed things up by around 40% on a PWM pin and 20% on a pin that does not use PWM.

Here are timings for a pulse (digitalWrite HIGH followed by digitalWrite LOW) measured using a logic analyzer on a 16MHz Arduino:

digitalWrite on pin 3 (PWM pin) – 4.8 us
digitalWrite on pin 4 (not PWM) – 3.5 us
modified digitalWrite with no PWM check on pin 3 (PWM pin) is 2.9 us

But using Pauls code (without a PWM check) would run around 40 times faster – 125 nanoseconds assuming that the pin was a constant.

I my opinion, most applications are fast enough with the existing digitalWrite, but applications that do need to minimise delays are best off with Paul's on-instruction macros. My vote is to give the fast write macros a different name so legacy code that relied on the performance of the current digitalWrite would not change significantly. And the go-faster macros could be chosen for those applications that really needed the high performance.

I do hope Paul's macros make it into 1.0 as this will be a significant benefit to users that need high performance I/O.

When this was discussed on the developer list, as I recall, the decision was to decouple the API change (whether to disable PWM or not) from the optimization. I was specifically asked to incorporate the PWM disable into the macro, which I did within a day.

The plan was to quickly include the optimization with PWM disable, because it would have little impact other than simply making sketches run faster. The API change, which turned out to be controversial with Tom and others, was to be considered separately. If PWM disable was to be removed at some future date, it would be easy to simply delete it from both the optimized macro and original compiled code.

Disabling PWM takes 3 more instructions with the macro, which execute in 312 ns on a 16 MHz chip. Even with the PWM disable, the const case optimization executes in 437 ns, which is still about 10X faster.

Also, I'd like to point out this macro approach is not my preferred coding style for this optimization. I did it this way because David said he didn't like the verbose inline function approach I originally posted. Indeed that way is much larger, but it has the advantage of not touching any pin when an illegal pin number is given as the input. This macro, as currently implemented, will always write to some pin even if the pin number is too large. Then again, that's a lot better than the terrible behavior of the compiled code with a too-large pin number!

edit: on Arduino Mega, this macro will suffer the issue #146 bug on the pins which aren't memory mapped in the lower 16 I/O space. Perhaps the bitWrite macro could be modified to disable interrupts in those cases, but really this sort of thing is much easier to deal with in an expanded static inline function.

I'm not sure i belong in this part of the discussion at all, since this is an area I feel very much a novice in, but maybe that perspective is valuable.

  1. Sometimes figuring out how you would explain something lets you choose between interface alternatives. Particularly with different boards having different pin-port correspondence this is not going to be trivial to explain.

  2. I realize that analogWrite and digitalWrite can both be used to control an LED, but I have no idea with what interfaced device one would really CARE about turning off the timer; that is: What would you plug into a pin and sometimes need the analogWrite fcn and later NEED the timer completely off? Would a novice ever stumble into using a device in that class? After all, the bootloader leaves the timers off, and (i think) the reset/power up sequence leaves the timers off.

  3. I supplied an alternative syntax that turned the timer off with pinModeFast2. One other simple syntax to consider would be to turn the timer off with analogWrite(0) and analogWrite(255) (or whatever the max is).

  4. Users who are sophisticated enough to control the timer using the port commands don't really need separate support here, but changing the interface might break their code, and prevent novices from building on posted code. As a not-quite example, frequency2Timer (I may not have the name precisely right) doesn't work with the Mega; sketches that built on that require more learning/research to implement than a novice may want/be able to do.

This talk about "PWM disable" in digitalWrite isn't about actually disabling the timers. It is merely about clearing a bit which causes the timer's PWM generation to essentially "take over" a pin, if it's configured as an output.

Obviously analogWrite() sets that bit, and it also make the pin an output. Until that bit is cleared, it doesn't matter if the pin is configured as high or low the normal way (writing to the PORTx register), because it's being controlled by the PWM waveform generator in the timer.

Currently, digitalWrite() clears that bit. This is nice, because if a pin were previously "written" using analogWrite(), without clearing this bit, digitalWrite() would have no effect. That might be pretty confusing. Of course, the clearing of that bit is pretty expensive in terms of speed, as it takes extra time, and it even burdens the non-PWM-capable pins with a check to know if the pin is PWM capable.

If digitalWrite() didn't clear that bit, then some other way would be needed, and every discussion I've heard on this issue seems to suggest using pinMode(). That ends up raising a lot more questions, like if a separate mode for PWM would be defined, or if the user would just configure it to OUTPUT mode again, even though they already had previously configured it to OUTPUT mode?

Ultimately, there are at least 2 important issues:

1: Is the digitalWrite, analogWrite, pinMode API a "thin" layer that is designed for performance, yet exposes hardware details? Or is it more "opaque", providing novices with conceptual simplicity and fully insulating them from painful hardware quirks? It's a classic performance vs usability trade-off.

2: Either way, what should the API really be? If that's anything other than what it currently is, would short term pain of breaking compatibility be worthwhile for long term performance or usability gains?

Today, regarding #1, I believe the API somewhere in between, achieving neither great performance nor excellent usability. We've all heard about the performance issues, and removing the PWM disable is intended to improve speed.

Usability-wise, even though digitalWrite handles this PWM disable, it fails to insulate users from other hardware quirks, mainly the pullup resistors (which I believe is pretty horrible for novices, since they get tricked into thinking they have a "hardware problem" when their code seems to be working, but the pin isn't able to do much more than drive a multimeter, where correct voltage checks further reinforce the notion their hardware is damaged, all because they forgot pinMode OUTPUT). digitalWrite() also doesn't do a bounds check on the pin number, so novices aren't protected from mistakes like a loop clearing all pins being "off by one" or "way off". You might also wonder why analogWrite switches the pin to OUTPUT mode automatically, but digitalWrite doesn't? There's also bug #146, which is rare, but extremely difficult when it does strike. digitalWrite() seems simple and easy to use, but it leaves plenty of "gotchas" for unsuspecting users.

People who teach classes and workshops have a great interest in the simplicity of the API, and having a coherent model free of low-level hardware quirks. People who built robots and other more complex projects care deeply about performance.

So this all gets into a lot of talk about the conceptual model this API is supposed to provide for users. Unfortunately, it's probably far too late to really throughly consider that model, because many thousands of people have written mountains of code that depends on the current behavior.

But anyway, that's what's meant by this "PWM disable" stuff. It's not about actually reconfiguring the timer (and each timer generates PWM for 2 or 3 pins, so you wouldn't do that anyway for a single pin). These issues about the API, conceptual/teaching models, performance and backwards compatibility are quite difficult. So far, the only easy answer has been to put off making any decisions.

So far, the only easy answer has been to put off making any decisions.

That sound just like the U.S. Congress model. :wink:

Lefty

Paul's summary is right on and I apologize for putting off so many of the decisions.

If we decide to remove the PWM checks from digitalWrite(), I would suggest adding a noAnalogWrite(pin) function which stop the PWM generation on the pin.

The thought was that if this change were to happen, it would be best to do in Arduino 1.0. In which case, it seems annoying to change the performance of digitalWrite() twice in quick succession: once for 0019 (optimizing it as Paul has implemented but keeping the PWM check) and once for 1.0 (removing the PWM check). But maybe the performance benefits outweigh the pain of having to tweak your code twice?

I would suggest adding a noAnalogWrite(pin) function which stop the PWM generation on the pin

"noAnalogWrite" implies that analog writes will never be performed on that pin or will be permanently turned off. I suggest "stopAnalogWrite".

These really are difficult decisions.

While I've put a LOT of work into improving performance, really I do believe the value of this API is simplicity for novice users. Obviously Arduino has been very successful at presenting an I/O model novices can understand.

I don't do a lot of teaching, but I do get called upon when users have difficult problems, especially strange ones nobody else has managed to solve (I do have about 20 years experience designing and debugging embedded designs in assembly language). I've seen first hand where users are empowered by the simplicity these functions offer, but then get terribly frustrated when they fail to deliver that simplicity with those hardware quirks I mentioned above. I really think solving those problems, even if it means sacrificing performance or breaking backwards compatibility, will in the long run really benefit everyone.

Of course, I love my code to run fast, and when I truly care about performance, I just write in assembly. But I don't advocate teaching novices assembly, or even quirks of the hardware's register-level access. Hopefully those quirks that are currently exposed by the API can be addressed.

Adding all these parallel API functions seems a bit silly to me.

As has been mentioned before in these types of discussions, why can't the API be layered?
Rather than go off and invent an entire new slew of functions?

In other words for something like a digitalWrite().
digitalWrite() is the top layer and works just like it does today with all the handholding which does cost performance.
It does the timer code stuff and then calls _digitalWrite()

_digitalWrite() drops the next level and eliminates the checks and gets a bit faster and is really fast (single instruction) for constant arguments. If arguments are not constants it calls __digitalWrite()

__digitalWrite() is the bottom layer which gets called when the arguments are not constants.

That way existing code works just as it does today except much faster when constants are used. Users that are more knowledgeable and don't need any handholding can call _digitalWrite() to pick up additional performance especially if the arguments are constants to get single instruction bit sets/clears.

With a combination of macros and/or inline functions you can get everything with no additional overhead. And it is very consistent, backward compatible and easy to document since each layer takes the same arguments.

So my question is why go off and create all these parallel functions, when a simple more traditional layered API gets you there as well?

--- bill

I think that's more or less what is planned in Arduino 1.0. I agree that is a better approach. At the time I turned Paul's ideas into a header file, it was not clear when, if, or how it would be implemented in that way and this seemed a way to get all of that functionality and similar syntax right away.