Fast digitalWrite() - Virtual Pin - C++ problem

The topic has been up several times. I think I have another approach. I have seen avr-g++ optimize and remove dead code so I think it possible to make a virtual digitalWrite() which is faster and take up less program space.

I have this program which compiled takes up 456 bytes of space.

void setup() {
  DDRB |= 1 << 5;
}
void loop() {
  if (PORTB & 1 << 5)
    PORTB &= ~(1 << 5);
  else
    PORTB |= 1 << 5;
}

It compile to:

$ avr-objdump -S /tmp/build*/*.cpp.elf | sed -n -e '/<loop>:/,/^$/p'
00000094 <loop>:
  94:   2d 9b           sbis    0x05, 5 ; 5
  96:   02 c0           rjmp    .+4             ; 0x9c <loop+0x8>
  98:   2d 98           cbi     0x05, 5 ; 5
  9a:   08 95           ret
  9c:   2d 9a           sbi     0x05, 5 ; 5
  9e:   08 95           ret

It is not easy to read but it simply toggles the pin 13 on a Arduino Uno. Now I want the functionality in a library (toggle) and the decision on which pin to use in user space. I think it could look like this:

#include <Pins.h>
Pin13 led;
#include <Toggler.h>
Toggler toggle(led);
void setup(){
}
void loop(){
  toggle.run();
}

From the above it should be clear that the compiler is aware of which pin to be used during compile time and thus able to optimize it to the minimum: 456 bytes.

Before I can write my library function Toggler I need some virtual pin defined.

I have tried to make the following program but the loop() is empty. I guess I have made som C++ error.

class VirtualPin {
public:
  void output() { 
  }
  void input() { 
  }
  void low() { 
  }
  void high() { 
  }
  char isHigh() { 
  }
};

class Toggler {
public:
  Toggler(VirtualPin& pin) : 
  _pin(pin){
    _pin.output();
  }
  void run() {
    if (_pin.isHigh())
      _pin.low();
    else
      _pin.high();
  }
private:
  VirtualPin& _pin;
};

/* Here comes all the pins defined but unknown to Toggler (yet) */

class Pin13: 
public VirtualPin {
public:
  inline void output() { 
    DDRB |= _BV(5); 
  }
  inline void input() { 
    DDRB &= ~_BV(5); 
  }
  inline void low() { 
    _SFR_BYTE(PORTB) &= ~_BV(5); 
  }
  inline void high() { 
    _SFR_BYTE(PORTB) |= _BV(5); 
  }
  inline char isHigh() { 
    return PORTB & _BV(5); 
  }
};

/* End of defines, here comes the program */

Pin13 led;
Toggler toggle(led);

void setup() {
  /* test led here to see one line per instruction in assembler */
  led.output();
  led.high();
}

void loop() {
  toggle.run();
}

/*
Compile and inspect function 'setup' and 'loop':
avr-objdump -S /tmp/build*/*.cpp.elf | sed -n -e '/<\(setup\|loop\)>:/,/^$/p'

000000a8 <setup>:
  a8:   25 9a           sbi     0x04, 5 ; 4
  aa:   2d 9a           sbi     0x05, 5 ; 5
  ac:   08 95           ret

000000ae <loop>:
  ae:   08 95           ret

*/

To toggle an output bit takes just one instruction, for example:

   PINB = 1<<5;

(see the data sheet for this unusual construction).

I guess you mean

PORTB ^= 1<<5;

which compiles to

  92:   95 b1           in      r25, 0x05       ; 5
  94:   80 e2           ldi     r24, 0x20       ; 32
  96:   89 27           eor     r24, r25
  98:   85 b9           out     0x05, r24       ; 5

It is more than one instruction.

But my actual problem is to toggle a bit, it is this part of my code:

void loop() {
  toggle.run();
}

which does not compile to any machine code at all.

It wanted it to generate the same as in the first example:

  94:   2d 9b           sbis    0x05, 5 ; 5
  96:   02 c0           rjmp    .+4
  98:   2d 98           cbi     0x05, 5 ; 5
  9a:   08 95           ret
  9c:   2d 9a           sbi     0x05, 5 ; 5

What have I done wrong in the classes VirtualPin and Toggler since it does not work? It does compile without errors but no machine code is generated.

I guess you mean

PORTB ^= 1<<5;

Nope. Writing a one to PINx is magic, and toggles that bit in the IO port (PORTx.) (In modern ATmega CPUs. See the datasheet.)

There’s been at least one implementation of a “pin” c++ template that implements myPin.toggle() in a single instruction.
https://github.com/greiman/DigitalIO

westfw: Writing a one to PINx is magic

Strange? I will look into that later.

westfw: There's been at least one implementation of a "pin" c++ template that implements myPin.toggle() in a single instruction. https://github.com/greiman/DigitalIO

Thanks! It looks good.

BTW, I did have a look at https://github.com/fenichelar/Pin but it is not that fast.

I am some how back to square one and is not sure about I should start a new thread.

I am now using https://github.com/greiman/DigitalIO and when using it straightforward it makes one instruction pr line of C++ code. It is build around C++ templates. When I try to make a library function in the beginning of the program and then try to do a “late binding” (or what it is called) I get an error.

#include "DigitalIO.h"

/* Begin declare lib-function */
class Foo {
public:
  Foo(DigitalPin pin) :
  _pin(pin) {
    _pin.mode(OUTPUT);
  }
  void run() {
    _pin.write(1);
    _pin.write(0);
  }
private:
  DigitalPin& _pin;
};
/* end of lib */

DigitalPin<13> pin13;
Foo foo(pin13);

void setup() {
}

void loop() {
  foo.run();
}

The error message is:

igitalPinBlink.ino:6:18: error: expected ‘)’ before ‘pin’
DigitalPinBlink.ino:15:3: error: invalid use of template-name ‘DigitalPin’ without an argument list
DigitalPinBlink.ino: In member function ‘void Foo::run()’:
DigitalPinBlink.ino:11:5: error: ‘_pin’ was not declared in this scope
DigitalPinBlink.ino: At global scope:
DigitalPinBlink.ino:20:14: error: no matching function for call to ‘Foo::Foo(DigitalPin<13u>&)’
DigitalPinBlink.ino:20:14: note: candidates are:
DigitalPinBlink.ino:4:7: note: Foo::Foo()
DigitalPinBlink.ino:4:7: note:   candidate expects 0 arguments, 1 provided
DigitalPinBlink.ino:4:7: note: Foo::Foo(const Foo&)
DigitalPinBlink.ino:4:7: note:   no known conversion for argument 1 from ‘DigitalPin<13u>’ to ‘const Foo&’

My approach is to make a library function which is not aware of which pin is going to be used, but just know that the pin i of type DigitalIO. Then I decide to use pin 13 and now I want to use that library function object and expect it to be as fast as the example program for DigitalIO. Is it not possible to make a C++ construction like this?

I think that "DigitalPin" is set up to handle the constant pin-number/value only. There's a separate "PinIO" class for handling pins that are variables at runtime. This is significantly slower, as it MUST be. The AVR's instructions for dealing with the ports quickly have both the port and bit number encoded in the (read-only) instruction, to handle variable pin numbers means treating the ports as memory locations, and doing the separate read/modify/write instructions. You probably won't get better than the 4-instruction sequence you show in Reply #2

For all the complaints about the speed of digitalWrite(), there aren't very many wasted cycles in there. If you want a variable pin number, variable new value, and arduino-style mapping of "pin#" to port/bit, it's difficult to do much better. (I think the main improvements in PinIO come from doing the mapping once, at class initialization time, and by having dedicated functions for high() and low()...)

westfw:
For all the complaints about the speed of digitalWrite(), there aren’t very many wasted cycles in there. If you want a variable pin number, variable new value, and arduino-style mapping of “pin#” to port/bit, it’s difficult to do much better. (I think the main improvements in PinIO come from doing the mapping once, at class initialization time, and by having dedicated functions for high() and low()…)

Oh yes there is.
The compiler is smart enough to be able to reach down into statically declared const array elements at compile time.
i.e. even if using the poor semantics of using a value vs set/high & clear/low semantics that they have chose to do.
And even not taking advantage of pin certain mapping information once vs always doing it runtime, they could have used a static const array for pin the mapping tables.
Had they done that and made a few other tweaks to the code, if the user had used const information for the pin # and the value, the code could potentially optimize it all the way down to a single instruction because the compiler is smart enough to do the pin lookups into a const data array at compile time.

Also, some of the look ups could have been avoided. Like the pin to port lookups.
Those should be simple inlines particuarly on the parts with a very limited number of ports, including the m328 vs looking them up in a table. That was a poor implementation decision.

There are other implementation options such as what Paul chose to do in the Teensy version of the digitalXXX() routines.
While he didn’t take advantage of using a static const data table, they are much faster by simply doing the code implementation better which is MUCH faster.
In fact these updates were rejected by the Arduino team for the official code.
They still use the same semantics but when const values are used they take advantage of that information as much as possible.
If pin and value are constants the teensy core will generate single instructions with no changes to sketch code. If not constants it still does a better job with fewer instructions so it still faster.
In other words, the the teensy code “just works” faster because it is better code.

— bill

Another option is to do what the ESP8266 guys did. They didn't make the silly mistake of allowing naked constant to be used with arbitrary pin mappings. If you had made the simple requirement that digitalxxx() routines had to use defines to specify the pin, vs allowing naked integers, then you could have done lots of things to encode information into them. This would have totally eliminated all the runtime data table lookups which are very expensive on AVR parts because of AVRs inability to directly access const data that is only in flash.

--- bill

@bill The compiler is smart enough to be able to reach down into statically declared const array elements at compile time.

I am not sure I understand but a change to the C standard where one could have two functions with same name and different kind of parameters (like C++ constructors has) would solve the case.

If you have these two functions:

digitalWrite(const int pin, const int state) {
#ifdef pin == 13
...
}
digitalWrite(int pin, const int state) {
/* standard Arduino digitalWrite() */
...
}

Then the compiler should decide which function to use based on parameters and optimize that way. Could this work? Does it require a change of the C standard?

The Servo-lib is made with a very late binding called attach(), but if the pin number is defined together with the variable it should be possible. One good thing would be that you can get the compile time error: For Uno you can only have servo on pin 3,5,6,9,10 and 11.