digitalWriteFast, digitalReadFast, pinModeFast etc

digitalWriteFast, digitalReadFast, pinModeFast etc.

This is an addition to the library that greatly speeds up reading and writing to the digital pins using the familiar pin number syntax.

One of the strengths of the Arduino is the low barrier to getting started for beginners. the simple syntax of digitalWrite, pinMode, and digitalRead is a big contributor to the simplicity. As beginners get more experience they move toward the more efficient port manipulation commands (see Arduino Reference - Arduino Reference, BUT some of the port information there is actually incorrect for the Mega, see
http://spreadsheets.google.com/pub?key=rtHw_R6eVL140KS9_G8GPkA&gid=0 ). The port manipulation commands control the same pins but refer to them in a completely different syntax and depend on specific details of the pin being known at the time the program is written. It is difficult using port manipulation to mimic the simplicity of
for (int i =2; i<=13; i++) digitalWrite(i,HIGH);

A few months ago, Paul Stoffregen proposed and worked out the important details of a somewhat intermediate version digitalWriteFast which he implemented completely as a macro. Like the port manipulation commands it is much faster than digitalWrite (over ten times faster) and the extra speed depends on knowing the pin numbers at compile time--it won't speed things up if its inside a subroutine or loop where the pin number is going to change. If the pin number is not known at compile time, it defaults to use the slower digitalWrite command. It uses the simple syntax of the digitalWrite type commands, which makes it attractive to beginners, and perhaps less error prone even to programmers who are a little more experienced.

I looked at what he had done and thought it needed just a little more work to make it a valuable library 'routine'--I put 'routine' in quotes because there's no .c file; everything is in macros in a .h file.

I extended it to include pinModeFast and (with a huge amount of assistance from Westfw) digitalReadFast. I've tested it fairly thoroughly on my Arduino Mega. It would be wise if people with other Arduinos would test those boards. Paul Stoffregen's work on defining the port and bit to pin conversion has been flawless but testing seems prudent. If you do test, I think a post here would be appropriate.

PWM, analogWrite digitalWriteFast2, etc.
As you know analogWrite works on some of the digital pins (the PWM pins) by setting a timer and cycling from +5V to ground with a duty cycle that makes the average voltage on the pin proportional to what you specify. The standard digitalRead and digitalWrite commands turn off the cycling of the timer every time they are used. pinMode does nothing to the cycling of the timer.

digitalWriteFast and digitalReadFast turn off the cycling of the timer every time they are used. pinModeFast does nothing to the cycling of the timer. This is the mode for maximum compatibility.

However there is a comment in the code for the standard commands that suggests it would be more efficient to turn off the timer in pinMode and not in the other 2 commands. This makes enormous sense to me; in many instances pinMode might be used just in setup() and so the extra overhead could be dispensed with.

I was reluctant to completely split with the maximum compatibility that more experienced developers had opted for. So I did it both ways. digitalWriteFast2 and digitalReadFast2 don't turn off the timer. pinModeFast2 does turn off the timer.

Download http://healthriskappraisal.org/dwf.zip
The folder called digitalWriteFast, containing digitalWriteFast.h and a keyword file goes in your library. A program called digitalWriteFastTest.pde which is what I used to test it on my mega. If you want to test it for a smaller board, you should be able to delete a huge number of test cases. There is also a program called progprog.py; because these new commands deliver their speedup when the pin numbers are known at compile time I needed to test the commands with source code that specified the pin numbers directly. With 50-some pins to test I needed to generate most of the test program source code automatically. progprog.py generates that test code. You would need something like it if you want to test on a seeduino Mega, for example.

To use it just put the digitalWriteFast folder into your library alongside the EEPROM, Ethernet, Firmata, etc folders. In your source code
#include <digitalWriteFast.h>
then you can use digitalWriteFast(pin,HIGH/LOW), digitalReadFast(pin), pinMode(pin,INPUT/OUTPUT), digitalWriteFast2(pin,HIGH/LOW), digitalReadFast2(pin), pinMode2(pin,INPUT/OUTPUT) very much as you have used the built-in commands. The object code will not only be faster, but actually smaller as well.

The downside
Without the huge performance advantage motivating you to learn to use the PORT, DDR and PIN registers directly, you may not learn to take advantage of them. If you need to manipulate several adjacent pins at once you may be able to do it with one command and get another performance increase; learning to use those commands may also move you closer to learning the commands to get better control over the PWM pins for finer control of those.

Lastly, there is the likelihood that you will use one of these 'Fast' commands in a way that means the pin number is not known at compile time. This could mean that you think you're getting high performance when you are really not. It might be hard to find that issue.

Thanks
Paul Stoffregen did the heavy lifting on this and deserves most of the kudos. I wouldn't have made my small contribution without Westfw's patient and insightful coaching. Any problems are due to my shortcomings, not theirs.

1 Like

Nicely done! I will have to give this library a try and see just what kind of performance gains can be had with it.

You've provided a nice intermediate level library for those of us that haven't got around to working directly with the hardware yet.

Nice to see you're using these macros I wrote months ago. At the time, David seemed interested to include them in Arduino. At least that's what he said on the developer mail list. I would not have done all that work had I known it would only sit in the issue tracker, rather than be included in 0018. (likewise, many other optimizations I've explored won't been ported to Arduino and written up nicely, because if issue #140 can't be included, certainly other more difficult optimizations for non-const cases won't be).

Still, a library implementation is better than nothing.

Could I convince you to surround each definition with a check to avoid redefining if it already exists? For example:

#if !defined(digitalPinToPortReg)
#define digitalPinToPortReg(P)
(((P) >= 0 && (P) <= 7) ? &PORTD : (((P) >= 8 && (P) <= 13) ? &PORTB : &PORTC))
#endif

I realize this adds a lot of extra #if and #endif lines. However, if these definitions ever become included in the Arduino core (where they rightly belong), your code will continue to work.

On 3rd party boards, this will allow those boards to define these macros for their different pin assignments, and your code will automatically use them.

Also, on Arduino Mega, issue #146 will apply to the pins which use registers beyond the range usable with the CBI and SBI instructions. Then again, issue #146 applies to ALL usage of the normal digitalWrite() and pinMode() functions, regardless of register addresses.

If you care about issue #146, you could add a check if the pointer (cast to an integer) is greater than 32, and if so, surround the bitWrite with code to save interrupt context, disable interrupts, and then restore. Because this is within the check for __builtin_constant_p(P), the compiler will only include that code for the appropriate pins.

So far, nobody seems to care much about issue #146. Someday I'll get around to writing some test cases to demonstrate the problem. Trust me, it is real. The "nobody has ever complained in 5 years" is only because recently have widely used libraries and functions called digitalWrite() from interrupts, and because the problems are so very mysterious and difficult to debug. Such is the way of race conditions.

Still, nice work on library-izing this. Hope it gets some use.

I will work on Paul's ideas and try to have an improved version posted by 3/10. This project wouldn't even have begun without his insight. As I said before he did all the hard work on this.

I did update the version on the server tonight. I did not implement all of Paul's suggestions. I did surround all of the pin to port/timer type functions with a single #if, #endif that I hope will be adequate to the needs of 3rd party boards. I similarly surrounded each of the macros for pinModeFast,digitalWriteFast,digitalReadFast,pinModeFast2,digitalWriteFast2, digitalReadFast2 with #if #endif statements.

Issue #146 relates to interference between interrupt routines that contain digitalWrite instructions(eg Servo and Tone Libraries) and non-interrupt code. It is clearly a real issue and will be a difficult one to sort out for most people who encounter it. On the other hand, the point of these library macros is to improve efficiency. I don't know that interrupts will be in use and don't know if digitalWrite will be used inside the interrupt routines in the code that uses these macros.

I gave real thought to trying to implement something that would be interrupt safe for one of the 2 versions. Perhaps as I gain proficiency I will return to this idea.

I bought a teensy++ from Paul. I haven't even gotten so far as to solder in headers, but I looked at some of the software that comes with the board to tie it to the Arduino IDE. Paul doesn't advertise that he's not only implemented THESE ideas as part of the digitalWrite etc commands there but he's ALSO greatly speeded up the way digitalWrite etc work when the pin number is not known at compile time. It looks to me like he's also implemented the code necessary to protect against the interrupt interference of issue 146; there is some conditionally included code that implies something like that. That would mean that his boards would work better with the servo and tone libraries than the branded Arduinos. The code itself combines complex macros and assembly language so that it is very tough reading.

All of which makes me think that the teensy++ may really be the board to go with. Its not just a tiny arduino clone but the software is actually enhanced in very important ways. I'm still trying to figure out how to tie something this small to a useful prototyping shield though.

@jrraines: Would you mind publishing your code somewhere else? The link you provided isn't working.

I suggest removing the check "__builtin_constant_p(V)" from digitalWriteFast...

#define digitalWriteFast(P, V) \
  if (__builtin_constant_p(P) [glow]&& __builtin_constant_p(V)[/glow]) { \
    if (digitalPinToTimer(P)) \
      bitClear(*digitalPinToTimer(P), digitalPinToTimerBit(P)); \
    bitWrite(*digitalPinToPortReg(P), digitalPinToBit(P), (V)); \
  } else { \
    digitalWrite((P), (V)); \
  }

Relatively speaking, the call to digitalWrite generates one machine instruction (a relative call). After removing the built-in check on V and passing a variable into the macro...

For non-PWM pins, five machine instructions are generated.

For PWM pins, eight machine instructions are generated.

In my opinion, this is a small price to pay for the huge increase in speed. If someone wants to save program space, they can call digitalWrite directly.

http://code.google.com/p/digitalwritefast/downloads/list

Mellis and Stoffregen felt that it was safer to include the PWM stuff. I thought the digitalWriteFast2, pinModeFast2 version was preferable. you can cut out a little overhead by using pinModeFast and digitalWriteFast2.

Thanks!

Mellis and Stoffregen felt that it was safer to include the PWM stuff

Um ... I didn't suggest removing the PWM stuff. My suggestion was to apply the fast version when V is a constant or a variable.

I thought the digitalWriteFast2, pinModeFast2 version was preferable. you can cut out a little overhead by using pinModeFast and digitalWriteFast2.

Looks good to me. I like it.

I agree, it's a good trade-off. In fact, that's exactly how I implemented it in the version that's inside Teensyduino.

The slow compiled code often requires almost that many instructions just to marshal the inputs into the required registers, when either isn't a compile time const. If the surrounding code is complex, but doesn't call other functions, which can often be the case with digitalWrite, the saving in register allocation are also a big win.

But that's not "exactly" how I wrote it. This "bitWrite" macro coding style really isn't my first choice. Inside Teensyduino, I implemented this using a giant chain of if-else checks, where each performs the desired write. While it's a lot longer, a LOT longer, it has the advantage of doing nothing if an illegal pin number is used. With this macro version, if an illegal pin number is used, it will write to the last pin. On all Arduino boards, that's an analog input, likely never configured as an output, so the effect would be activating the pullup resistor on that analog in... which could be pretty confusing if the poor, misguided user didn't realize some other unrelated code mistakenly wrote an illegal pin number. For that reason, I've never been very happy with this style.

Back in November 2009, David said he intended to include this into the official Arduino core, and he preferred this macro style (I had posted a short example of the if-else way), so I wrote it this way for contribution to the official Arduino version. Sadly, with 0018 gone by, and issue #140 not tagged for 0019 or 1.0, it seems unlikely these optimizations will ever become part of the official digitalWrite. Had I known that then, I wouldn't have bothered to write these for the official Arduino boards. I'm certainly not going to put any more effort into it now, other than pointing out this lack of checking for illegal pin number input.

Then again, erroneously writing to the last pin is a lot better than what the slow compiled digitalWrite does. It will happily use the too-large pin number as an index to an array, reading whatever happens to be in memory after those tables, and use that data as a pointer and bitmask to write to someplace in memory! Not good.

Then again, there's issue 146 & 170, which also seems unlikely to ever get fixed.

It makes me sad to see so little care and concern for code quality. I think maybe it's time to turn off my notifications for this thread....

I realized that one form of documentation I should have provided from the start was examples of what the disassembly code looks like. These examples were compiled for my Mega, so pin/port correspondence may not be the same as other boards, but it will give a better idea of what is generated:

// these are non-pwm pins whose port is below 0x100
pinModeFast(51,INPUT);
    5c4e:      22 98             cbi      0x04, 2      ; 4
digitalWriteFast(51,HIGH); 
    5c50:      2a 9a             sbi      0x05, 2      ; 5
pinModeFast(50,OUTPUT);
    5c52:      23 9a             sbi      0x04, 3      ; 4
digitalWriteFast(50,LOW);
    5c54:      2b 98             cbi      0x05, 3      ; 5

//these pins are on a port above 0x100
pinModeFast2(48,INPUT);
    5bb2:      80 91 0a 01       lds      r24, 0x010A
    5bb6:      8d 7f             andi      r24, 0xFD      ; 253
    5bb8:      80 93 0a 01       sts      0x010A, r24
digitalWriteFast2(48,LOW);
    5bbc:      80 91 0b 01       lds      r24, 0x010B
    5bc0:      8d 7f             andi      r24, 0xFD      ; 253
    5bc2:      80 93 0b 01       sts      0x010B, r24
pinModeFast2(49,OUTPUT);
    5bc6:      80 91 0a 01       lds      r24, 0x010A
    5bca:      81 60             ori      r24, 0x01      ; 1
    5bcc:      80 93 0a 01       sts      0x010A, r24
digitalWriteFast2(49,HIGH);
    5bd0:      80 91 0b 01       lds      r24, 0x010B
    5bd4:      81 60             ori      r24, 0x01      ; 1
    5bd6:      80 93 0b 01       sts      0x010B, r24


//these are pwm with a port address below 0x100:
pinModeFast(2,INPUT);
     32c:      6c 98             cbi      0x0d, 4      ; 13
digitalWriteFast(2,HIGH); 
     32e:      80 91 90 00       lds      r24, 0x0090
     332:      8f 7d             andi      r24, 0xDF      ; 223
     334:      80 93 90 00       sts      0x0090, r24
     338:      74 9a             sbi      0x0e, 4      ; 14
pinModeFast(5,OUTPUT);
     33a:      6b 9a             sbi      0x0d, 3      ; 13
digitalWriteFast(5,LOW);
     33c:      80 91 90 00       lds      r24, 0x0090
     340:      8f 77             andi      r24, 0x7F      ; 127
     342:      80 93 90 00       sts      0x0090, r24
     346:      73 98             cbi      0x0e, 3      ; 14

pinModeFast2(2,INPUT);
     908:      80 91 90 00       lds      r24, 0x0090
     90c:      8f 7d             andi      r24, 0xDF      ; 223
     90e:      80 93 90 00       sts      0x0090, r24
     912:      6c 98             cbi      0x0d, 4      ; 13
digitalWriteFast2(2,LOW);
     914:      74 98             cbi      0x0e, 4      ; 14
pinModeFast2(5,OUTPUT);
     916:      80 91 90 00       lds      r24, 0x0090
     91a:      8f 77             andi      r24, 0x7F      ; 127
     91c:      80 93 90 00       sts      0x0090, r24
     920:      6b 9a             sbi      0x0d, 3      ; 13
digitalWriteFast2(5,HIGH);
     922:      73 9a             sbi      0x0e, 3      ; 14

//some of these have an address above 0x100:
pinModeFast(12,INPUT);
    247a:      26 98             cbi      0x04, 6      ; 4
digitalWriteFast(12,HIGH); 
    247c:      80 91 80 00       lds      r24, 0x0080
    2480:      8f 7d             andi      r24, 0xDF      ; 223
    2482:      80 93 80 00       sts      0x0080, r24
    2486:      2e 9a             sbi      0x05, 6      ; 5
pinModeFast(9,OUTPUT);
    2488:      80 91 01 01       lds      r24, 0x0101
    248c:      80 64             ori      r24, 0x40      ; 64
    248e:      80 93 01 01       sts      0x0101, r24
digitalWriteFast(9,LOW);
    2492:      80 91 b0 00       lds      r24, 0x00B0
    2496:      8f 7d             andi      r24, 0xDF      ; 223
    2498:      80 93 b0 00       sts      0x00B0, r24
    249c:      80 91 02 01       lds      r24, 0x0102
    24a0:      8f 7b             andi      r24, 0xBF      ; 191
    24a2:      80 93 02 01       sts      0x0102, r24

@jrraines: Thank you. That's helpful.

@Paul Stoffregen:

Inside Teensyduino, I implemented this using a giant chain of if-else checks, where each performs the desired write. While it's a lot longer, a LOT longer, it has the advantage of doing nothing if an illegal pin number is used.

I prefer the chain for a different reason. With some clever formatting, it can be made to look like a table. Ensuring the Arduino-pin to port-pin mapping is accurate is easy.

Back in November 2009, David said he intended to include this into the official Arduino core, and he preferred this macro style (I had posted a short example of the if-else way), so I wrote it this way for contribution to the official Arduino version. Sadly, with 0018 gone by, and issue #140 not tagged for 0019 or 1.0, it seems unlikely these optimizations will ever become part of the official digitalWrite.

That's unfortunate. I REALLY like these digital*Fast functions...

Had I known that then, I wouldn't have bothered to write these for the official Arduino boards. I'm certainly not going to put any more effort into it now, other than pointing out this lack of checking for illegal pin number input.

I definately appreciate your and jrraines effort. I'm trying to squeeze an application onto a 2313. I've determined that without these functions, I would have had to resort to port manipulation.

Who can't love these things! High level function calls reduced to a single machine instruction! It's the best of hand-assembly and C++.

Then again, erroneously writing to the last pin is a lot better than what the slow compiled digitalWrite does. It will happily use the too-large pin number as an index to an array, reading whatever happens to be in memory after those tables, and use that data as a pointer and bitmask to write to someplace in memory! Not good.

That is a bit unnerving.

It makes me sad to see so little care and concern for code quality. I think maybe it's time to turn off my notifications for this thread....

PLEASE stay with us (me)!

I'm only an Arduino beginner, but I'm not sure I understand why it is that revised functionality with identical behavior as before (thus no API change) yet improved performance are not getting committed?

What am I missing?

I think that is Paul's point. In many situations this will be both faster and also smaller.

I have encountered at least one situation where I removed delays knowing that digitalWrite was slow and doubted the code would work if digitalWrite was speeded up. That is the logic behind calling the functions by different names.

Thanks for your reply, jrraines!

Also, only now, after looking in more detail at DigitalWriteFast (I also hadn't seen the excellent and concise description at Google Code Archive - Long-term storage for Google Code Project Hosting. yet), I understand how foolish my question was. DigitalWriteFast requires pin numbers to be known at compile time. Period.
i.e. digitalWrite(9, HIGH) can be sped up, digitalWrite(i, HIGH) can't.

Likely this will be obvious right away to most readers of this topic, but to me, as a novice, it wasn't.

Thanks for your time! :slight_smile:

Westfw somewhere remarked that consistent speed of digitalWrite may be desirable. Another consideration.

why it is that revised functionality with identical behavior as before (thus no API change) yet improved performance are not getting committed?

Part of the problem is in EXPLAINING the new functions. None of the Arduino functions currently include information about execution speed, and for most applications it isn't important. Far slower systems have existed and solved problems. Now suddenly we want to add "fast" versions, and the possibility that they will introduce confusion is rather high. "when do I need the fast function? Do I need fastSerial.Print too? fastAnalogRead? I changed blink to use the fast functions and it's still blinking once per second?"

DigitalWriteFast requires pin numbers to be known at compile time. Period.
i.e. digitalWrite(9, HIGH) can be sped up, digitalWrite(i, HIGH) can't.

Those are two different statements. DigitalWriteFast is quite careful to "work" with pin numbers that are variables. It just won't be any faster. (adds more confusion, you know.
"Isn't the pin number a constant in the blink example? How come sometimes it runs fast and sometimes it runs slowly?")

Fully agreed.

Your explanations are far more exact and unambiguous. That's why I'm a novice and you're an expert :slight_smile:

Have you tested digitalWriteFast(i++); (assuming i is a defined variable) ?

I don't think I'd used that specific syntax. I just ran a simple example and it seems to give the correct result. Have you had a problem?

uint8_t i=18;
digitalWriteFast(i++,HIGH);
lcd(0,0)<<"19 = "<<(long)i;