digitalWriteFast, digitalReadFast, pinModeFast etc

@jrraines: Would you mind publishing your code somewhere else? The link you provided isn't working.

I suggest removing the check "__builtin_constant_p(V)" from digitalWriteFast...

#define digitalWriteFast(P, V) \
  if (__builtin_constant_p(P) [glow]&& __builtin_constant_p(V)[/glow]) { \
    if (digitalPinToTimer(P)) \
      bitClear(*digitalPinToTimer(P), digitalPinToTimerBit(P)); \
    bitWrite(*digitalPinToPortReg(P), digitalPinToBit(P), (V)); \
  } else { \
    digitalWrite((P), (V)); \
  }

Relatively speaking, the call to digitalWrite generates one machine instruction (a relative call). After removing the built-in check on V and passing a variable into the macro...

For non-PWM pins, five machine instructions are generated.

For PWM pins, eight machine instructions are generated.

In my opinion, this is a small price to pay for the huge increase in speed. If someone wants to save program space, they can call digitalWrite directly.

http://code.google.com/p/digitalwritefast/downloads/list

Mellis and Stoffregen felt that it was safer to include the PWM stuff. I thought the digitalWriteFast2, pinModeFast2 version was preferable. you can cut out a little overhead by using pinModeFast and digitalWriteFast2.

Thanks!

Mellis and Stoffregen felt that it was safer to include the PWM stuff

Um ... I didn't suggest removing the PWM stuff. My suggestion was to apply the fast version when V is a constant or a variable.

I thought the digitalWriteFast2, pinModeFast2 version was preferable. you can cut out a little overhead by using pinModeFast and digitalWriteFast2.

Looks good to me. I like it.

I agree, it's a good trade-off. In fact, that's exactly how I implemented it in the version that's inside Teensyduino.

The slow compiled code often requires almost that many instructions just to marshal the inputs into the required registers, when either isn't a compile time const. If the surrounding code is complex, but doesn't call other functions, which can often be the case with digitalWrite, the saving in register allocation are also a big win.

But that's not "exactly" how I wrote it. This "bitWrite" macro coding style really isn't my first choice. Inside Teensyduino, I implemented this using a giant chain of if-else checks, where each performs the desired write. While it's a lot longer, a LOT longer, it has the advantage of doing nothing if an illegal pin number is used. With this macro version, if an illegal pin number is used, it will write to the last pin. On all Arduino boards, that's an analog input, likely never configured as an output, so the effect would be activating the pullup resistor on that analog in... which could be pretty confusing if the poor, misguided user didn't realize some other unrelated code mistakenly wrote an illegal pin number. For that reason, I've never been very happy with this style.

Back in November 2009, David said he intended to include this into the official Arduino core, and he preferred this macro style (I had posted a short example of the if-else way), so I wrote it this way for contribution to the official Arduino version. Sadly, with 0018 gone by, and issue #140 not tagged for 0019 or 1.0, it seems unlikely these optimizations will ever become part of the official digitalWrite. Had I known that then, I wouldn't have bothered to write these for the official Arduino boards. I'm certainly not going to put any more effort into it now, other than pointing out this lack of checking for illegal pin number input.

Then again, erroneously writing to the last pin is a lot better than what the slow compiled digitalWrite does. It will happily use the too-large pin number as an index to an array, reading whatever happens to be in memory after those tables, and use that data as a pointer and bitmask to write to someplace in memory! Not good.

Then again, there's issue 146 & 170, which also seems unlikely to ever get fixed.

It makes me sad to see so little care and concern for code quality. I think maybe it's time to turn off my notifications for this thread....

I realized that one form of documentation I should have provided from the start was examples of what the disassembly code looks like. These examples were compiled for my Mega, so pin/port correspondence may not be the same as other boards, but it will give a better idea of what is generated:

// these are non-pwm pins whose port is below 0x100
pinModeFast(51,INPUT);
    5c4e:      22 98             cbi      0x04, 2      ; 4
digitalWriteFast(51,HIGH); 
    5c50:      2a 9a             sbi      0x05, 2      ; 5
pinModeFast(50,OUTPUT);
    5c52:      23 9a             sbi      0x04, 3      ; 4
digitalWriteFast(50,LOW);
    5c54:      2b 98             cbi      0x05, 3      ; 5

//these pins are on a port above 0x100
pinModeFast2(48,INPUT);
    5bb2:      80 91 0a 01       lds      r24, 0x010A
    5bb6:      8d 7f             andi      r24, 0xFD      ; 253
    5bb8:      80 93 0a 01       sts      0x010A, r24
digitalWriteFast2(48,LOW);
    5bbc:      80 91 0b 01       lds      r24, 0x010B
    5bc0:      8d 7f             andi      r24, 0xFD      ; 253
    5bc2:      80 93 0b 01       sts      0x010B, r24
pinModeFast2(49,OUTPUT);
    5bc6:      80 91 0a 01       lds      r24, 0x010A
    5bca:      81 60             ori      r24, 0x01      ; 1
    5bcc:      80 93 0a 01       sts      0x010A, r24
digitalWriteFast2(49,HIGH);
    5bd0:      80 91 0b 01       lds      r24, 0x010B
    5bd4:      81 60             ori      r24, 0x01      ; 1
    5bd6:      80 93 0b 01       sts      0x010B, r24


//these are pwm with a port address below 0x100:
pinModeFast(2,INPUT);
     32c:      6c 98             cbi      0x0d, 4      ; 13
digitalWriteFast(2,HIGH); 
     32e:      80 91 90 00       lds      r24, 0x0090
     332:      8f 7d             andi      r24, 0xDF      ; 223
     334:      80 93 90 00       sts      0x0090, r24
     338:      74 9a             sbi      0x0e, 4      ; 14
pinModeFast(5,OUTPUT);
     33a:      6b 9a             sbi      0x0d, 3      ; 13
digitalWriteFast(5,LOW);
     33c:      80 91 90 00       lds      r24, 0x0090
     340:      8f 77             andi      r24, 0x7F      ; 127
     342:      80 93 90 00       sts      0x0090, r24
     346:      73 98             cbi      0x0e, 3      ; 14

pinModeFast2(2,INPUT);
     908:      80 91 90 00       lds      r24, 0x0090
     90c:      8f 7d             andi      r24, 0xDF      ; 223
     90e:      80 93 90 00       sts      0x0090, r24
     912:      6c 98             cbi      0x0d, 4      ; 13
digitalWriteFast2(2,LOW);
     914:      74 98             cbi      0x0e, 4      ; 14
pinModeFast2(5,OUTPUT);
     916:      80 91 90 00       lds      r24, 0x0090
     91a:      8f 77             andi      r24, 0x7F      ; 127
     91c:      80 93 90 00       sts      0x0090, r24
     920:      6b 9a             sbi      0x0d, 3      ; 13
digitalWriteFast2(5,HIGH);
     922:      73 9a             sbi      0x0e, 3      ; 14

//some of these have an address above 0x100:
pinModeFast(12,INPUT);
    247a:      26 98             cbi      0x04, 6      ; 4
digitalWriteFast(12,HIGH); 
    247c:      80 91 80 00       lds      r24, 0x0080
    2480:      8f 7d             andi      r24, 0xDF      ; 223
    2482:      80 93 80 00       sts      0x0080, r24
    2486:      2e 9a             sbi      0x05, 6      ; 5
pinModeFast(9,OUTPUT);
    2488:      80 91 01 01       lds      r24, 0x0101
    248c:      80 64             ori      r24, 0x40      ; 64
    248e:      80 93 01 01       sts      0x0101, r24
digitalWriteFast(9,LOW);
    2492:      80 91 b0 00       lds      r24, 0x00B0
    2496:      8f 7d             andi      r24, 0xDF      ; 223
    2498:      80 93 b0 00       sts      0x00B0, r24
    249c:      80 91 02 01       lds      r24, 0x0102
    24a0:      8f 7b             andi      r24, 0xBF      ; 191
    24a2:      80 93 02 01       sts      0x0102, r24

@jrraines: Thank you. That's helpful.

@Paul Stoffregen:

Inside Teensyduino, I implemented this using a giant chain of if-else checks, where each performs the desired write. While it's a lot longer, a LOT longer, it has the advantage of doing nothing if an illegal pin number is used.

I prefer the chain for a different reason. With some clever formatting, it can be made to look like a table. Ensuring the Arduino-pin to port-pin mapping is accurate is easy.

Back in November 2009, David said he intended to include this into the official Arduino core, and he preferred this macro style (I had posted a short example of the if-else way), so I wrote it this way for contribution to the official Arduino version. Sadly, with 0018 gone by, and issue #140 not tagged for 0019 or 1.0, it seems unlikely these optimizations will ever become part of the official digitalWrite.

That's unfortunate. I REALLY like these digital*Fast functions...

Had I known that then, I wouldn't have bothered to write these for the official Arduino boards. I'm certainly not going to put any more effort into it now, other than pointing out this lack of checking for illegal pin number input.

I definately appreciate your and jrraines effort. I'm trying to squeeze an application onto a 2313. I've determined that without these functions, I would have had to resort to port manipulation.

Who can't love these things! High level function calls reduced to a single machine instruction! It's the best of hand-assembly and C++.

Then again, erroneously writing to the last pin is a lot better than what the slow compiled digitalWrite does. It will happily use the too-large pin number as an index to an array, reading whatever happens to be in memory after those tables, and use that data as a pointer and bitmask to write to someplace in memory! Not good.

That is a bit unnerving.

It makes me sad to see so little care and concern for code quality. I think maybe it's time to turn off my notifications for this thread....

PLEASE stay with us (me)!

I'm only an Arduino beginner, but I'm not sure I understand why it is that revised functionality with identical behavior as before (thus no API change) yet improved performance are not getting committed?

What am I missing?

I think that is Paul's point. In many situations this will be both faster and also smaller.

I have encountered at least one situation where I removed delays knowing that digitalWrite was slow and doubted the code would work if digitalWrite was speeded up. That is the logic behind calling the functions by different names.

Thanks for your reply, jrraines!

Also, only now, after looking in more detail at DigitalWriteFast (I also hadn't seen the excellent and concise description at http://code.google.com/p/digitalwritefast/ yet), I understand how foolish my question was. DigitalWriteFast requires pin numbers to be known at compile time. Period. i.e. digitalWrite(9, HIGH) can be sped up, digitalWrite(i, HIGH) can't.

Likely this will be obvious right away to most readers of this topic, but to me, as a novice, it wasn't.

Thanks for your time! :)

Westfw somewhere remarked that consistent speed of digitalWrite may be desirable. Another consideration.

why it is that revised functionality with identical behavior as before (thus no API change) yet improved performance are not getting committed?

Part of the problem is in EXPLAINING the new functions. None of the Arduino functions currently include information about execution speed, and for most applications it isn't important. Far slower systems have existed and solved problems. Now suddenly we want to add "fast" versions, and the possibility that they will introduce confusion is rather high. "when do I need the fast function? Do I need fastSerial.Print too? fastAnalogRead? I changed blink to use the fast functions and it's still blinking once per second?"

DigitalWriteFast requires pin numbers to be known at compile time. Period. i.e. digitalWrite(9, HIGH) can be sped up, digitalWrite(i, HIGH) can't.

Those are two different statements. DigitalWriteFast is quite careful to "work" with pin numbers that are variables. It just won't be any faster. (adds more confusion, you know. "Isn't the pin number a constant in the blink example? How come sometimes it runs fast and sometimes it runs slowly?")

Fully agreed.

Your explanations are far more exact and unambiguous. That's why I'm a novice and you're an expert :)

Have you tested digitalWriteFast(i++); (assuming i is a defined variable) ?

I don't think I'd used that specific syntax. I just ran a simple example and it seems to give the correct result. Have you had a problem?

uint8_t i=18;
digitalWriteFast(i++,HIGH);
lcd(0,0)<<"19 = "<<(long)i;

Part of the problem is in EXPLAINING the new functions.

By that logic, no improvements would ever be added.

David Mellis specifically requested this code, and when I wrote it for normal Arduino boards, he specifically requested it implemented for the Arduino Mega, which I also did within a matter of a couple days. Difficulty of documentation was never a concern when he (and others on the developer list) wanted this, back in November 2009.

Since then, it's sat in the issue tracker for about half a year. However, he did recently flag this for the 1.0 milestone, so maybe it'll actually make it into the official Arduino core within the next 6 months?

Still not known is if this code will simply be used as digitalWrite(), or if a new name like digitalWriteFast() will be used, or if David will end up implementing it some other way. However it David ends up using this, assuming he ever does, I'm sure once it's actually committed to svn and due to be released, somehow explaining/documenting it really won't be a big deal, and if it doesn't introduce a new name, perhaps no documentation changes will be needed at all?

I've heard this "but we'd have to document it and support it" line many, many times before, usually in the corporate world by mid level managers who just don't want to do anything innovative, unless the directive comes from those above them. There is some point to it for dramatically new products, but really, in cases like this where the feature is just a performance improvement that carries virtually no risk, virtually no backwards compatibility, and is pretty much just invisible, I just don't buy that line about how difficult documentation is. It'd probably take less time than we've sent writing all these message in this thread!

I will take some of the responsibility for all this 'difficulty explaining' discussion; I think my writeup was not well done. That was partly because I'd understood the remarks from others who wondered 'why bother when using PORT etc directly could give better efficiency at run time'.

Trying to acknowledge the truth of that but point out the simplicity and ease of use of using this led to something even more circuitous than this post.

To slightly change the subject, the other thing I'd have thought would warranted prompt adoption into the core is Streaming.h--I'd have thought it should just have been added to print.h. But both David Mellis and Mikal Hart seemed in agreement that it made sense to leave it out. And I will say that leaving it out led to further improvements--Mikal Hart changing the crux of it from 7 lines to one amazingly functional line of code and Michael Margolis and others adding support for HEX etc. That development might have been impeded if it were in the core.

So there are concerns about who maintains, extends and can commit code. With digitalWriteFast there is the concern about who will revise the macros when a board more complex than the Mega comes out. When that happens I will certainly work on it; I would expect it would stretch my ability to write macros quite considerably. As I have acknowledged before I would not have gotten my small contribution to this done without building on Paul's work and without Westfw's patient guidance.

Speaking of my limited abilities with macros, Paul pointed outthe issue with pin numbers that are too high a while back. There is a #error "Your error message here." macro. I can make it work with #ifdef, but I spent an hour or two playing with how to make it work for a numeric issue and had no success. Any tips would be appreciated.

The reason I've been holding off on including this is the need for deciding whether it's an optimization to the existing digitalWrite() or an additional function. This seemed like something that made sense to target for the 1.0 release, but probably could have been (could be) resolved sooner.

In particular, I think we should simply remove the checking for and disabling of PWM output from the digitalWrite() function and use this optimized version. Then we will have, if I understand it, a one-instruction digitalWrite() for cases where the pin number and value are a compile-time constant. Anything slower than that seems like it might suggest an alternative (faster) option, which I think would be an unnecessary complication.

Mellis, I don't think removing the PWM check from the current digitalWrite will provide anything close to the performance of Paul's one-instruction write macro.

Removing the code to check and disable PWM would speed things up by around 40% on a PWM pin and 20% on a pin that does not use PWM.

Here are timings for a pulse (digitalWrite HIGH followed by digitalWrite LOW) measured using a logic analyzer on a 16MHz Arduino:

digitalWrite on pin 3 (PWM pin) – 4.8 us digitalWrite on pin 4 (not PWM) – 3.5 us modified digitalWrite with no PWM check on pin 3 (PWM pin) is 2.9 us

But using Pauls code (without a PWM check) would run around 40 times faster – 125 nanoseconds assuming that the pin was a constant.

I my opinion, most applications are fast enough with the existing digitalWrite, but applications that do need to minimise delays are best off with Paul's on-instruction macros. My vote is to give the fast write macros a different name so legacy code that relied on the performance of the current digitalWrite would not change significantly. And the go-faster macros could be chosen for those applications that really needed the high performance.

I do hope Paul's macros make it into 1.0 as this will be a significant benefit to users that need high performance I/O.

When this was discussed on the developer list, as I recall, the decision was to decouple the API change (whether to disable PWM or not) from the optimization. I was specifically asked to incorporate the PWM disable into the macro, which I did within a day.

The plan was to quickly include the optimization with PWM disable, because it would have little impact other than simply making sketches run faster. The API change, which turned out to be controversial with Tom and others, was to be considered separately. If PWM disable was to be removed at some future date, it would be easy to simply delete it from both the optimized macro and original compiled code.

Disabling PWM takes 3 more instructions with the macro, which execute in 312 ns on a 16 MHz chip. Even with the PWM disable, the const case optimization executes in 437 ns, which is still about 10X faster.

Also, I'd like to point out this macro approach is not my preferred coding style for this optimization. I did it this way because David said he didn't like the verbose inline function approach I originally posted. Indeed that way is much larger, but it has the advantage of not touching any pin when an illegal pin number is given as the input. This macro, as currently implemented, will always write to some pin even if the pin number is too large. Then again, that's a lot better than the terrible behavior of the compiled code with a too-large pin number!

edit: on Arduino Mega, this macro will suffer the issue #146 bug on the pins which aren't memory mapped in the lower 16 I/O space. Perhaps the bitWrite macro could be modified to disable interrupts in those cases, but really this sort of thing is much easier to deal with in an expanded static inline function.