Using C++ templates to generated optimal code

Hello!

I am a programmer and I've recently started to tinker with Arduinos, as a a hobby (well, I actually needed some simple automation of the heating system of my house and anything I could find was outrageously expensive... so here I am).

One of the first things I've learnt when starting with Arduinos is how limited is the available memory, especially RAM. I also tend to love abstraction layers (being a C++ programmer takes it's toll). So, for starters, instead of using digitalWrite(), digitalRead() everywhere I've written some simple input/output abstractions.

I've analyzed 2 approaches to this:

  1. Plain old C++ - encapsulate some functionality in a class. Blink:
class DigitalOutputPin {
  public:
    DigitalOutputPin(uint8_t pin, bool inverted) :
      _pin(pin),
      _inverted(inverted) {
   }
   
  public:
    // the pin number
    const uint8_t _pin;
    // if true then the output is inverted
    const bool _inverted;

  public:
    void initialize() {
      pinMode(_pin, OUTPUT);
    }

  public:
    void set(bool state) {
      digitalWrite(_pin, _inverted ? (state ? LOW : HIGH) : (state ? HIGH : LOW));
    }
};

DigitalOutputPin outputPin(LED_BUILTIN, false);

void setup() {
  outputPin.initialize();
}

void loop() {
  outputPin.set(true);
  delay(1000);
  outputPin.set(false);
  delay(1000);
}

/*
After compiling:

Sketch uses 1014 bytes (3%) of program storage space. Maximum is 32256 bytes.
Global variables use 11 bytes (0%) of dynamic memory, leaving 2037 bytes for local variables. Maximum is 2048 bytes.
*/
  1. Use C++ templates to avoid storing const parameters in members (thus freeing up some RAM in the process) and make all the members static. Still blinking here:
// PIN is the pin number and if INVERTED is true then the output is inverted
template<uint8_t PIN, bool INVERTED>
class DigitalOutputPinT {
  public:
    typedef DigitalOutputPinT<PIN, INVERTED> CLASS;

  private:
    DigitalOutputPinT();

  public:
    static void initialize() {
      pinMode(PIN, OUTPUT);
    }

  public:
    static void set(bool state) {
      digitalWrite(PIN, INVERTED ? (state ? LOW : HIGH) : (state ? HIGH : LOW));
    }
};

typedef DigitalOutputPinT<LED_BUILTIN, false> OUTPUT_PIN;

void setup() {
  OUTPUT_PIN::initialize();
}

void loop() {
  OUTPUT_PIN::set(true);
  delay(1000);
  OUTPUT_PIN::set(false);
  delay(1000);
}

/*
After compiling:

Sketch uses 930 bytes (2%) of program storage space. Maximum is 32256 bytes.
Global variables use 9 bytes (0%) of dynamic memory, leaving 2039 bytes for local variables. Maximum is 2048 bytes.
*/

Interestingly enough the second approach seems to be optimal - the generated code is basically the same as it is when using digitalWrite() all over the place and no RAM is used at all.

This happens because gcc (the compiler used by the Arduino IDE) is mature and really good at optimizations. Thus the generated static DigitalOutputPinT<> members are inlined, the template parameters are used as immediate values in the generated code and (in this simple case) no RAM is actually used since there is no data stored anywhere. The downside of the template approach is that each generated class is a distinct type so there is no way to store instances in an array, etc...

So, to summarize:

Using constant parameters as template parameters of classes with static member functions will generate code that is really close to optimal (and sparing some RAM in the process), while still keeping things abstract enough. This is generally true for classes that have short, simple member functions that are good candidates for inlining.

Hope this helps!

I've written some simple input/output abstractions.

You've added unnecessary complexity to an otherwise simple system. There is no good reason to hide the fact that you are reading from, or writing to, a pin.

Abstraction is good when it is something more than an exercise in obfuscation. You aren't Russian, are you?

Hope this helps!

Not in the slightest. Easy to understand code is orders of magnitude better than obfuscated code that takes a lot of time to understand, and then shake your head at.

PaulS:
You've added unnecessary complexity to an otherwise simple system. There is no good reason to hide the fact that you are reading from, or writing to, a pin.

Fair enough. The code in the post was meant to prove a point and it is not meant to be used as is - therefore the simplicity.

PaulS:
Abstraction is good when it is something more than an exercise in obfuscation. You aren't Russian, are you?

I agree with you on abstraction. But I think you totally missed the point - the post was meant to help people write better abstractions when needed and not to obfuscate obvious stuff. There are lots of 3rd party libraries that could benefit from this approach and avoid having 50% percent of available RAM lost in unneeded stuff. And my nationality is totally irrelevant.

PaulS:
Not in the slightest. Easy to understand code is orders of magnitude better than obfuscated code that takes a lot of time to understand, and then shake your head at.

The way everyone is trading ease of understanding for performance depends on the actual application - writing code for educational purposes is different than writing maintainable, production-ready code.

Happy coding!

miancule:
The way everyone is trading ease of understanding for performance depends on the actual application - writing code for educational purposes is different than writing maintainable, production-ready code.

I agree with you 100%. The question that begs an answer though is what educational purpose is the Arduino? The response most often is to enhance the level of understanding about writing compact, efficient code for a resource-constrained embedded micro-controller. In that sense, abstraction is synonymous with weight. If you understand the syntax, what is so difficult about PORTB |= (1 << 5);? (125us vs. 4,187us).

You're absolutely right - as long as you're targeting a single platform.

If you're writing a more generic approach to, let's say, handling digital outputs across a range of related platforms (which is the case for most libraries) then you need some sort of abstraction - that can start from #define PORTB (in the system libraries) to digitalWrite() (also in the system library) and so forth.

As I said, my post was presenting a technique that can spare some bytes/cycles when writing C++ code for abstraction layers, nothing less and nothing more. It has it's ups and downs and as any programming paradigm... is well suited for some applications and totally useless in others. And my personal feeling is that having more options is always a good thing :)...

No argument there (which is why I'd like to see C++ support generic macros). My world, and the world I cater to, typically fits in 32K. Abstraction is a great way to produce cost-effective, multi-platform, portable code. At the board level however, we are often forced to strip away all those layers in order to maximize speed and minimize size. If I were building database applications I would want a top-level C++ person. In my world though, that same individual is of little value if they can't pare it back to the bone. There's simply not the room. Let's not lose sight here that we are dealing with (potential) embedded designers, of which there are precious few in the industry. The more I see the young ones searching for that elusive library that does everything for them, instead of writing the code themselves, the less confident I am in our collective ability to compete intellectually. As much as I admire those who can help produce million line applications, (s)he whom I hire is the one that shows me an assembly stub function that shaves a microsecond off the processing time and/or saves a dozen bytes of memory. I hope you do not take offense, we are obviously from different worlds, but I frown on the use of libraries let alone increased abstraction and, according to some industry guidelines, we're not allowed to use them (we can't even use function pointers!).

DKWatson:
At the board level however, we are often forced to strip away all those layers in order to maximize speed and minimize size.

With crosscompiling and modern compilers that is no longer true in general.

DKWatson:
I hope you do not take offense, we are obviously from different worlds, but I frown on the use of libraries let alone increased abstraction and, according to some industry guidelines, we're not allowed to use them (we can't even use function pointers!).

No offense taken. And I understand your point - no relational DB in the 32k world, totally reasonable.

Still, with modern compilers C/C++ can provide means to generate optimal code while keeping an expressive, platform-independent and intuitive interface - if used properly. I guess that's the main reason why most of the coding is not done in assembler - at least in my non-32k world :)...

Thanks for the insight!

A proof of concept:

// Library

namespace M328P {
  /* ... Other pins here */
  struct D13 {
    static uint8_t * const P = &PORTB;
    static const uint8_t B = 1 << 5;
  };
};

template<typename PIN>
void dOn() {
  *PIN::P |= PIN::B;
}
template<typename PIN>
void dOff() {
  *PIN::P &= ~PIN::B;
}

// Client code

using namespace M328P;

void loop() {
  dOn<D13>();
  delay(1000);
  dOff<D13>();
  delay(1000);
}

... generates precisely the same machine code as:

// Client code

void loop() {
  PORTB |= (1 << 5);
  delay(1000);
  PORTB &= ~(1 << 5);
  delay(1000);
}

.. but, at least for me, seems to be more explicit. And that's good, the code conveys meaning: I'm turning on/off digital output 13 of the M328P microcontroller.

Please note that the client code itself is pretty similar - the syntax complexity lies in the library header file.

I rest my case here :)...

Old joke, paraphrased for topical appropriateness. The difference between C++ programmers and the rest of the world: whilst everyone believes in keeping all options open, for C++ programmers all options are written in C++.

miancule:
I rest my case here :)...

Meh. Teensyduino does it better.

Optimal Templated C++ Abstractions have been done before, as well as Optimal C functions (ie both generating single instructions for the cases where that's possible.) Bill Greiman's code is probably a good C++ style example.
In general, expanding the trivial cases to include all of the capabilities of digitalWrite() (variable pin# and variable value) turns out to be difficult, unruly, not much faster than the existing digitalWrite(), and annoyingly inconsistent in runtime.

I'm not sure I understand why pins weren't defined as objects in the original implementation. I mean, Serial ports are object, SPI and TWI ports are objects... Perhaps the original author didn't understand, or didn't want to use, templates, which I think would cause pin objects to get ... complicated.

The syntax complexity lies in the library header file.

Actually, that's one of my major complaints about C++ (and other "modern" OO languages, I guess.)
There is this claim about how objects make things easier to understand, and perhaps that's true of top-level code that uses existing well-defined libraries, if you try to dig into the library to get a deeper understanding of what is going on, you quickly run into a morass of We Used All The Features; nearly incomprehensible...

westfw:
Optimal Templated C++ Abstractions have been done before, as well as Optimal C functions (ie both generating single instructions for the cases where that's possible.) Bill Greiman's code is probably a good C++ style example.

Absolutely, I don't pretend I made the stuff up myself.

westfw:
In general, expanding the trivial cases to include all of the capabilities of digitalWrite() (variable pin# and variable value) turns out to be difficult, unruly, not much faster than the existing digitalWrite(), and annoyingly inconsistent in runtime.

The vast majority of Arduino code is using fixed pin assignments, something along the lines of:

#define LED_PIN 3
...
digitalWrite(LED_PIN, LOW);

Having the overhead of a function call (hardly inlineable) for this seems pretty much.

As for passing a variable value - we cannot avoid a load/store sequence in the generated code but still, relying on the compiler do the optimizations is the smart thing to do, IMHO.

westfw:
Actually, that's one of my major complaints about C++ (and other "modern" OO languages, I guess.)
There is this claim about how objects make things easier to understand, and perhaps that's true of top-level code that uses existing well-defined libraries, if you try to dig into the library to get a deeper understanding of what is going on, you quickly run into a morass of We Used All The Features; nearly incomprehensible...

Well, I guess this is what the libraries are all about - not worrying about the implementation as long as it works as expected. Or hacking away - but for this one should still have the knowledge to do that.

Happy coding!

to digitalWrite: did you see the implementation? and what about i2cIoExpander.digitalWrite? and firmata.digitalWrite?

Juraj:
to digitalWrite: did you see the implementation? and what about i2cIoExpander.digitalWrite? and firmata.digitalWrite?

Yes and yes and yes. And I can hardly see any relevance of anything related to those.

The origin of basic Arduino functions is Processing. Processing is for control of MCU board from PC. Arduino is a port of Processing API directly to MCU.

Functions like digitalWrite are used for the same task remotely and locally. Without this functions it is not Arduino, but AVR programming.