Go Down

Topic: digitalWriteFast, digitalReadFast, pinModeFast etc (Read 111300 times) previous topic - next topic

fat16lib

I agree with jrraines, guidance from David Mellis is key.  A vision/philosophy for the future of Arduino is needed.  

There needs to be a way to add new/improved functionality but keep the core simple and stable.

For example, UNIX added asynchronous read with a new API, aio_read(), and didn't change read().  Some bad ideas have been removed from UNIX/Linux and some bad ideas live on but a better version of the API has been added.

I suspect the following would not be acceptable to the core Arduino developers but it illustrates why guidance is needed.

If I were allowed add a new API I would use an class with inline functions high() and low() that run 5-6 times faster than digitalWrite().  Here is a prototype sketch to illustrate the idea.
Code: [Select]
#include "pins_arduino.h"

class PinWriter {
 volatile uint8_t* _reg;
 uint8_t _bit;
public:
 void high(void) {*_reg |= _bit;}
 void low(void) {*_reg &= ~_bit;}
 void init(uint8_t pin) {
   // clear PWM timer bit here in real version
   pinMode(pin, OUTPUT);
   _bit = digitalPinToBitMask(pin);
   uint8_t port = digitalPinToPort(pin);
   _reg = portOutputRegister(port);
 }
};

PinWriter pin;

uint8_t pinNumber = 13;

void setup(void) {
 pin.init(pinNumber);
}

void loop(void) {
 // make four pulses for scope
 pin.high();
 pin.low();
 pin.high();
 pin.low();
 pin.high();
 pin.low();  
 pin.high();
 pin.low();
 delay(1);
}

This class has a fairly constant execution time for examples I have tried.  About 12 cycles (750 ns) for high() and 13 cycles (810 ns) for low().  It does use a bit of RAM.

An optimized digitalWrite() facility for constant pin number would be available for advanced users the way aio_read() is available in UNIX.

To repeat, there are many possible choices for digitalWrite() so I think the principle Arduino developers need to provide guidance.

pjrc

Please make sure any writing to pins is atomic.  Interrupts must be disabled during read-modify-write operations.  Please don't resurrect issue #146.

http://code.google.com/p/arduino/issues/detail?id=146

fat16lib

Paul I agree, the actual implementation should be atomic.  I just lifted code from 0018 to illustrate that there is a wide range of choices for improving/replacing digitalWrite().

Even with the extra instructions, an atomic version of this class is over four times faster than an atomic version of digitalWrite().

I believe there needs to be a clear statement of what the goal/requirement is for an improved digitalWrite()/digitalRead().  This would provide some constrains on proposed implementations.

pjrc

I believe you have a very good point.

However, as a practical matter, every API discussion I've been involved in or watched as it developed ultimately came down to an arbitrary decision by David Mellis, or the discussion faded away and nothing happened.

I really do not want to get involved in more API discussions.

David certainly wants to optimize digitalWrite, since he asked me to write this, which I did (Ben Combee deserves credit for originally suggesting using __built_constant_p), and David flagged the code in issue #140 as a milestone for Arduino 1.0.

i have opinions about the API, but ultimately they don't matter.

bperrybap

Quote

I did have a problem with a TC74 sensor when I optimized I2C bit-bang.  This sensor is limited to 100 kHz I2C.

Still it is hard to believe that a factor of thirty in execution speed of basic I/O functions will not cause problems.  That is why I designed my optimized functions to get a compile time error rather than revert to the slow functions.


It isn't that faster library code can't potentially cause problems. It is that it only causes problems for code that is taking advantage of and depending on certain timing that is not guaranteed to be anything and can change from release to release since the API does not claim to offer any timing maximums or minimums.

In some situations that may acceptable or even required, but in those cases, the code may not be able to properly function on a newer release of the tools.

A better way to write portable library code that "just works". is to write any timing critical sections in assembly to avoid any potential timing issues. In the absence of that, then write C code that is known to generate a fixed sequence of known instructions with guaranteed timing.
(which is what happens when you can get CBI/SBI instructions).
Then,  for cases where specific timing minimums and hardware setup times must be ensured, use cycle delays to ensure that those hardware setup times are met.
Like in your case, you need to ensure that the signals don't toggle any faster than 100khz. There are portable ways to do this in C code that will work on any AVR at any clock rate. The latest GLCD library code that I wrote does exactly this.

This allows the library code to not only run on any AVR at any clock rate, but also get faster should the compiler or tools generate better code.

In the case of the GLCD library code, it steps WAY outside the arduino digital pin API routines to get the speed it needs. Now had the arduino libraries been written differently (same API just a different way), the GLCD library would have been able to use arduino supplied API functions rather than having to jump through hoops to avoid the arduino digital pin i/o library routines to get the needed i/o speed.


Quote

I guess my experience in embedded control systems where predictable api execution time is important doesn't apply to the Arduino.


Sure there are many times when predictable api execution times are mandatory for something to function properly. In those cases the API functions should offer, define, and honor such timing. In the absence of that, it is very foolish for the developer to write his code assuming any timing for a particular API function will remain a constant as tools and libraries are updated across releases.

In the larger picture, I don't believe that Arduino is any different when it comes to real-time timing needs from any other embedded environment.
i.e. eventually you may want to talk to hardware and that hardware may have critical signal timing that has to be met or not violated.

The difference is in Arduino there really are 2 types of needs for API capabilities:
- 1 for the Arduino application writers who are less technical
- 1 for the Arduino library writers that have real-time requirements
 and write a library to be called by the less technical users.

The two needs are quite different.

Quote

If Paul is correct then there should be only one digitalWrite and not several under bar variations.


For the most part, and in general, I agree with this.
However, there are times in a library where you have to have specific timing and even the newer much faster high level digital i/o functions like what paul is now using in his teensy libraries that replace the arduino digital pin i/o routines, may still not be quite good enough.
And that is where the additional (underbar) functions can really make the difference.

Also, keep in mind that to implement what paul is talking about,
(which is the same thing that I've been talking about - and paul and I have collaborated at times), function layering is required.
You have to have different layers of function calls (which have to have different names) that can be called depending on the type of argument (constant vs variable and direct i/o vs the additional handholding).
The function layer to eventually call is determined at compile time.

So whether you openly advertise the names of the sub functions that get called under the hood for doing the high level functions of handling timers, pwm, and direction bits, or the low level CBI/SBI instructions, you still have to have these subfunctions.

What I've been saying all along is that since you have to have these layered function calls anyway, go ahead and advertise them and each layers behavior.
Then use a common naming convention like tacking on additional (underbar) so that those that want or need the absolute fastest bit setting/clearing capability can use them and avoid any additional overhead.

This would allow those that want the existing API with all its handholding to get it, to get a  performance boost for using constants over the klunky existing ditital i/o routines.
And those that need the absolute fastest single instruction SBI/CBI code to set/clear bits have a known and documented way to do that for their real-time library needs.

--- bill

jrraines

There has been a little discussion on the developer's mailing list in the last few days about code size in the Arduino-19 release candidate as opposed to Arduino-18. I also was playing around with ODBuino recently and used this in the LCD code there to save space. It is interesting to compare this version of Blink
Code: [Select]
/* Blinking LED
* ------------
*
* turns on and off a light emitting diode(LED) connected to a digital  
* pin, in intervals of 2 seconds. Ideally we use pin 13 on the Arduino
* board because it has a resistor attached to it, needing only an LED

*
* Created 1 June 2005
* copyleft 2005 DojoDave <http://www.0j0.org>
* http://arduino.berlios.de
*
* based on an orginal by H. Barragan for the Wiring i/o board
*/

#define ledPin  13                 // LED connected to digital pin 13

void setup()
{
 pinMode(ledPin, OUTPUT);      // sets the digital pin as output
}

void loop()
{
 digitalWrite(ledPin, HIGH);   // sets the LED on
 delay(1000);                  // waits for a second
 digitalWrite(ledPin, LOW);    // sets the LED off
 delay(1000);                  // waits for a second
}

which takes 1392 bytes on Arduino-18 with this one:
Code: [Select]
#include <digitalWriteFast.h>
/* Blinking LED
* ------------
*
* turns on and off a light emitting diode(LED) connected to a digital  
* pin, in intervals of 2 seconds. Ideally we use pin 13 on the Arduino
* board because it has a resistor attached to it, needing only an LED

*
* Created 1 June 2005
* copyleft 2005 DojoDave <http://www.0j0.org>
* http://arduino.berlios.de
*
* based on an orginal by H. Barragan for the Wiring i/o board
*/

#define ledPin  13                 // LED connected to digital pin 13

void setup()
{
 pinModeFast(ledPin, OUTPUT);      // sets the digital pin as output
}

void loop()
{
 digitalWriteFast(ledPin, HIGH);   // sets the LED on
 delay(1000);                  // waits for a second
 digitalWriteFast(ledPin, LOW);    // sets the LED off
 delay(1000);                  // waits for a second
}

which takes 764 bytes.

Note that to get the savings I had to change ledPin from an int to a defined constant.

westfw

That would be a mega1280 based OBDuino, and a version of FastDigitalWrite() that DOESN'T turn off PWM, right?  The 1280 has a lot of PWM pins, and a lot of code in the digitalWrite() function to turn off PWM if it was on...
(on a 328, blink compiles to "only" 950 bytes with 19, 890 bytes with 18...)


e@ellenwang.com

How about wrapping the macros in the standard do { ... } while (0) so the expansion will absorb a trailing semicolon?  Or better, use gcc inline functions.

jrraines

earlier on this forum item, on 5/16, Paul said
Quote
I'd like to point out this macro approach is not my preferred coding style for this optimization.  I did it this way because David said he didn't like the verbose inline function approach I originally posted.  Indeed that way is much larger, but it has the advantage of not touching any pin when an illegal pin number is given as the input.  This macro, as currently implemented, will always write to some pin even if the pin number is too large.


This is only your second post and you seem to know a trick for getting rid of ';' in macros that I do not. I still feel very much like a newcomer to a lot of this stuff.

e@ellenwang.com

jrraines wrote
Quote
earlier on this forum item, on 5/16, Paul said Quote:
Quote
I'd like to point out this macro approach is not my preferred coding style for this optimization.  I did it this way because David said he didn't like the verbose inline function approach I originally posted.  Indeed that way is much larger, but it has the advantage of not touching any pin when an illegal pin number is given as the input.  This macro, as currently implemented, will always write to some pin even if the pin number is too large.

This is only your second post and you seem to know a trick for getting rid of ';' in macros that I do not. I still feel very much like a newcomer to a lot of this stuff.

Well.  I've been writing C code for a long time, but only playing with the Arduino for a little bit.  And I cop to joining the thread late and not having read all the messages.

It surprises me that the inline function "is much larger".  I assume he means the generated code is bigger.  My experience with gcc is that inline functions are quite efficient, and the little test code I just tried confirmed that avr-gcc is not different.  Of course, digitalWriteFast.h is much more complex than my test.  Maybe I'll try to convert it to inline functions later.

Also, I think the macros that use if statements can also use the ?: conditional expression.  Is that also for better generated code?

Again, sorry if this has all been gone through before.

Coding Badly

Quote
I assume he means the generated code is bigger.

Naw.  The source code is where the difference lies.  They both reduce to a single (or a few depending on the circumstances) machine instruction.  It's a matter of taste, maintenance, and functionality.  My preference is for inline-functions for one reason: I HATE debugging macro related bugs.

Quote
Of course, digitalWriteFast.h is much more complex than my test.  Maybe I'll try to convert it to inline functions later.

I suggest starting with Paul Stoffregen's work...
http://www.pjrc.com/teensy/teensyduino.html

Quote
Also, I think the macros that use if statements can also use the ?: conditional expression.

That's very likely true.  AlphaBeta uses conditional expressions frequently and he hasn't complained of any problems.

Quote
Is that also for better generated code?

I don't think there is any difference in the generated code.  "What's in an "if"? That which we call a condition.  By any other name would branch just the same."

westfw

I didn't think you could meaningfully use __builtin_constant_p in inline functions?

jrraines

the only place I used the ? trigrams was in digitalReadFast. I need it there because it yields a value I can assign to a variable. Bill Westfield had to explain that to me. see http://www.arduino.cc/cgi-bin/yabb2/YaBB.pl?num=1266983837/1

e@ellenwang.com

westfw wrote:
Quote
I didn't think you could meaningfully use __builtin_constant_p in inline functions?

That's a good point, but it turns out that it does work.  I was too lazy to look up the documentation to see whether it's defined behavior though.  I just tried this:

Code: [Select]
static int
g(int n)
{
  return n * 10;
}

static inline int
f(int n)
{
  if (__builtin_constant_p(n))
     return n + 10;
  return g(n);
}

int
main(int argc, char **argv)
{
  return f(10);
}


and got this (with avr-gcc -O -S):

Code: [Select]
...
main:
/* prologue: frame size=0 */
       ldi r28,lo8(__stack - 0)
       ldi r29,hi8(__stack - 0)
       out __SP_H__,r29
       out __SP_L__,r28
/* prologue end (size=4) */
       ldi r24,lo8(20)
       ldi r25,hi8(20)
/* epilogue: frame size=0 */
       rjmp exit
...


Not the most elegant test program but I was in a bit of a hurry.

Go Up