Direct port manipulation - looking for a little education

OK, so I've been trying to write more efficient code and more syntatically correct code. Up til now, I have been using pin mode and digital read/writes to setup and manipulate I/O. Most of my projects are run on ATMega328's and ATTiny85's. I have recently started dabbling in direct port manipulation (since I have a project were I need to free up programming space and needed more instruction processing speed) and trying to decipher the various examples and tutorials I have found online as well as answers to questions found here. I have come across something interesting and I would like to see if anyone can provide me with an explanation.

I found there are different ways to perform direct port manipulation so I decided to see if there was a difference in the methods (speedwise). The first thing I found was something I started using without knowing really how it worked (I don't normally like doing that, but I was in a bind and needed to get the program running). I can't remember where I got the example for this, but after reading the reference page on port manipulation (Arduino - PortManipulation), I can't help but feel I doing it wrong (but it feels so right). I have this code to send a byte to a shift register:

    PIND = 0b00010000;  //this is the latchpin on PD4
    shiftOut(datapin, clockpin, MSBFIRST, sRegOut);
    PIND = 0b00010000;

which works perfectly (the bit goes low for the shift out, then goes high again), but reading the port manipulation page PINx is a "read only" function so I'm not sure how this is working. If anyone has any insight on this and whether it is good practice or not, I would appreciate it.

After reading through the port manipulation page, I started doing a little digging and found the _BV() macro. I played around with it a little bit and found it worked just as well. I then decided to see if there was a difference in execution speed (setting bits high/low) so I set up a little experiment. I used the below code (uncommenting each group, one at a time) and checked the speed at which the pin was turned on and off. I was a little surprised by the result so I'm hoping someone can explain this as well.

void setup() {
  Serial.begin(115200);
  DDRD = B11111110;
  PORTD = B00000000;
}

void loop() {
//  digitalWrite(7, HIGH);
//  digitalWrite(7, LOW);
//  132kHz - the speed comments were added after testing each group.

//    PORTD = PORTD | 0x10;
//    PORTD = PORTD & 0xEF;
//  842kHz

//    PIND = 0b00100000;
//    PIND = 0b00100000;
//  941kHz

//    PORTD |= _BV(PD6);
//    PORTD &= ~_BV(PD6);
//  842kHz
}

The speed listed below each group is what I was getting on the pin with my Oscope. I was really surprised by how much slower the digital read/write functions were compared to the others. I was also surprised that the _BV() macro was the same speed as the PORTx function. I thought there might be a slight delay for the macro, but I guess not. What's really surprising is what should not be working at all generates the fastest processing time by almost 100kHz. This is basically 7 times faster than using digital read/write and 11% faster than the PORTx method. The only odd thing I found was that the off (low) time was sometimes 2 to 3 times that of the on (high) time. I figure this has to do with the turn at the end of the loop when the program starts back at the top.

I hope this information helps someone else out and I'm really looking forward to the explanation of PINx though.

Budreaux:
I'm really looking forward to the explanation of PINx though.

Why not start at the definitive source? ATMega328P datasheet, Section 14.1:

Three I/O memory address locations are allocated for each port, one each for the Data Register – PORTx, Data
Direction Register – DDRx, and the Port Input Pins – PINx. The Port Input Pins I/O location is read only, while
the Data Register and the Data Direction Register are read/write. However, writing a logic one to a bit in the
PINx Register, will result in a toggle in the corresponding bit in the Data Register. In addition, the Pull-up Disable
– PUD bit in MCUCR disables the pull-up function for all pins in all ports when set.

Writing to PIND actually toggle the bits

Édit: gfvalvo was faster :slight_smile:

Macros are just text substitution before compile so do not bring more penalty than writing the equivalent and with littéral bit values the compiler generates all statically without any bit shifting at run time

Btw you are really measuring the penalty of the loop looping... try this:

#define toggle10   PIND = 0b00100000; PIND = 0b00100000; PIND = 0b00100000; PIND = 0b00100000; PIND = 0b00100000; PIND = 0b00100000; PIND = 0b00100000; PIND = 0b00100000; PIND = 0b00100000; PIND = 0b00100000;


void setup() {
  Serial.begin(115200);
  DDRD = B11111110;
  PORTD = B00000000;
}

void loop()
{
  while(1) {
    toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10
    toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10
    toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10
    toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10
    toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10
    toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10
    toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10
    toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10
    toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10
    toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10 toggle10
  }
}

the idea is to repeat the PIND assignment 1000 time so that you see less the penalty to looping (and the while will be faster than the loop)

You should approach 8Mhz as it takes 1 clock cycle to toggle

Budreaux:
I thought there might be a slight delay for the macro, but I guess not.

Macros are evaluated at compile time.

I use the Arduino bit macros to minipulate single bits:

bitSet(PORTD,4);
shiftOut(datapin, clockpin, MSBFIRST, sRegOut);
bitClear(PORTD,4);

https://www.arduino.cc/reference/en/language/functions/bits-and-bytes/bitset/

The bit macros from Arduino.h:

#define bitRead(value, bit) (((value) >> (bit)) & 0x01)
#define bitSet(value, bit) ((value) |= (1UL << (bit)))
#define bitClear(value, bit) ((value) &= ~(1UL << (bit)))
#define bitWrite(value, bit, bitvalue) (bitvalue ? bitSet(value, bit) : bitClear(value, bit))

@OP

The following diagram may offer a conceptual view on the internal structure of an IO pin of the ATmega328P MCU.
pd31.png
Figure-1: Conceptual view on the internal structure of an IO pin of ATmega328P MCU

pd31.png

Thanks for the replies. So it looks like if you need to toggle a pin, PINx is the route to go. So the blink sketch could be condensed from this

void setup() {
  // initialize digital pin LED_BUILTIN as an output.
  pinMode(LED_BUILTIN, OUTPUT);
}

// the loop function runs over and over again forever
void loop() {
  digitalWrite(LED_BUILTIN, HIGH);   // turn the LED on (HIGH is the voltage level)
  delay(1000);                       // wait for a second
  digitalWrite(LED_BUILTIN, LOW);    // turn the LED off by making the voltage LOW
  delay(1000);                       // wait for a second
}

to this

void setup() {
  // initialize digital pin LED_BUILTIN as an output.
  pinMode(LED_BUILTIN, OUTPUT);
}

// the loop function runs over and over again forever
void loop() {
  PINB = 0b00100000  //toggle the LED pin
  delay(1000);                       // wait for a second
}

yeah, I know, not a lot of space saving there. For me though, I like this because I can visualize the I/O as bits in a byte and I do a lot of for and while loops that parse ports of pins. It's somewhat cumbersome to loop through variable names, but easy for bit numbers. Still I know if you have a large program and need to reference the bit on a regular basis, then assigning a variable name is better (of course you could just use a bit of both).

J-M-L, I may have to give that a try. That's one of the next things I'm going to dive into is using defines and creating libraries. Just wondering if something like this would be the same

void loop() {
  while (1) {
  PIND = 0b00100000;
  }
}

is there any benefit to the define other than just being able to call the name when needed?

Golam, thanks for the pic. I am more apt to understand visuals. I'm curious as to where it came from (Atmel?). I noted that it shows external pullup/pulldown resistors as 2.2k. I've seen a lot of forum posts where folks are wondering what values to use for PU/PD resistors. If 2.2k is what Atmel recommends, that's good enough for me.

Budreaux:
J-M-L, I may have to give that a try. That's one of the next things I'm going to dive into is using defines and creating libraries. Just wondering if something like this would be the same

void loop() {

while (1) {
  PIND = 0b00100000;
  }
}



is there any benefit to the define other than just being able to call the name when needed?

Hi if you do this, for every toggle you'll also pay the price of a jump to the beginning of the while loop

the reason why I had 1000 PIND = 0b00100000; next to each other is to minimize any side effect cost for testing conditions or looping

you'll still get the timer interrupts to maintain millis() etc that will impact you but overall you should see something closer to true performance

Compile-time macros are better than Magic Numbers, there’s no extra cost:

PIND = 1 << PIND5;

or

PIND = _BV(PIND5);

Budreaux:
yeah, I know, not a lot of space saving there. For me though, I like this because I can visualize the I/O as bits in a byte and I do a lot of for and while loops that parse ports of pins. It’s somewhat cumbersome to loop through variable names, but easy for bit numbers. Still I know if you have a large program and need to reference the bit on a regular basis, then assigning a variable name is better (of course you could just use a bit of both).

This isn’t an either-or situation. You can do both! Design a library that allows you to create objects that act like a normal variable, but is actually interfacing with the Arduino pin using the register and bitmask. The key to this exercise will be operator overloading, specifically the assignment operator and implicit conversion operator.

What something to study? Here’s the Arduino EEPROM.h library. Check out how they’ve designed the EERef class. You can modify that (and strip out all the unnecessary fluff) to make that work with port manipulation instead of the EEPROM functions.

/*
  EEPROM.h - EEPROM library
  Original Copyright (c) 2006 David A. Mellis.  All right reserved.
  New version by Christopher Andrews 2015.

  This library is free software; you can redistribute it and/or
  modify it under the terms of the GNU Lesser General Public
  License as published by the Free Software Foundation; either
  version 2.1 of the License, or (at your option) any later version.

  This library is distributed in the hope that it will be useful,
  but WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
  Lesser General Public License for more details.

  You should have received a copy of the GNU Lesser General Public
  License along with this library; if not, write to the Free Software
  Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
*/

#ifndef EEPROM_h
#define EEPROM_h

#include <inttypes.h>
#include <avr/eeprom.h>
#include <avr/io.h>

/***
    EERef class.
    
    This object references an EEPROM cell.
    Its purpose is to mimic a typical byte of RAM, however its storage is the EEPROM.
    This class has an overhead of two bytes, similar to storing a pointer to an EEPROM cell.
***/

struct EERef{

    EERef( const int index )
        : index( index )                 {}
    
    //Access/read members.
    uint8_t operator*() const            { return eeprom_read_byte( (uint8_t*) index ); }
    operator uint8_t() const             { return **this; }
    
    //Assignment/write members.
    EERef &operator=( const EERef &ref ) { return *this = *ref; }
    EERef &operator=( uint8_t in )       { return eeprom_write_byte( (uint8_t*) index, in ), *this;  }
    EERef &operator +=( uint8_t in )     { return *this = **this + in; }
    EERef &operator -=( uint8_t in )     { return *this = **this - in; }
    EERef &operator *=( uint8_t in )     { return *this = **this * in; }
    EERef &operator /=( uint8_t in )     { return *this = **this / in; }
    EERef &operator ^=( uint8_t in )     { return *this = **this ^ in; }
    EERef &operator %=( uint8_t in )     { return *this = **this % in; }
    EERef &operator &=( uint8_t in )     { return *this = **this & in; }
    EERef &operator |=( uint8_t in )     { return *this = **this | in; }
    EERef &operator <<=( uint8_t in )    { return *this = **this << in; }
    EERef &operator >>=( uint8_t in )    { return *this = **this >> in; }
    
    EERef &update( uint8_t in )          { return  in != *this ? *this = in : *this; }
    
    /** Prefix increment/decrement **/
    EERef& operator++()                  { return *this += 1; }
    EERef& operator--()                  { return *this -= 1; }
    
    /** Postfix increment/decrement **/
    uint8_t operator++ (int){ 
        uint8_t ret = **this;
        return ++(*this), ret;
    }

    uint8_t operator-- (int){ 
        uint8_t ret = **this;
        return --(*this), ret;
    }
    
    int index; //Index of current EEPROM cell.
};

/***
    EEPtr class.
    
    This object is a bidirectional pointer to EEPROM cells represented by EERef objects.
    Just like a normal pointer type, this can be dereferenced and repositioned using 
    increment/decrement operators.
***/

struct EEPtr{

    EEPtr( const int index )
        : index( index )                {}
        
    operator int() const                { return index; }
    EEPtr &operator=( int in )          { return index = in, *this; }
    
    //Iterator functionality.
    bool operator!=( const EEPtr &ptr ) { return index != ptr.index; }
    EERef operator*()                   { return index; }
    
    /** Prefix & Postfix increment/decrement **/
    EEPtr& operator++()                 { return ++index, *this; }
    EEPtr& operator--()                 { return --index, *this; }
    EEPtr operator++ (int)              { return index++; }
    EEPtr operator-- (int)              { return index--; }

    int index; //Index of current EEPROM cell.
};

/***
    EEPROMClass class.
    
    This object represents the entire EEPROM space.
    It wraps the functionality of EEPtr and EERef into a basic interface.
    This class is also 100% backwards compatible with earlier Arduino core releases.
***/

struct EEPROMClass{

    //Basic user access methods.
    EERef operator[]( const int idx )    { return idx; }
    uint8_t read( int idx )              { return EERef( idx ); }
    void write( int idx, uint8_t val )   { (EERef( idx )) = val; }
    void update( int idx, uint8_t val )  { EERef( idx ).update( val ); }
    
    //STL and C++11 iteration capability.
    EEPtr begin()                        { return 0x00; }
    EEPtr end()                          { return length(); } //Standards requires this to be the item after the last valid entry. The returned pointer is invalid.
    uint16_t length()                    { return E2END + 1; }
    
    //Functionality to 'get' and 'put' objects to and from EEPROM.
    template< typename T > T &get( int idx, T &t ){
        EEPtr e = idx;
        uint8_t *ptr = (uint8_t*) &t;
        for( int count = sizeof(T) ; count ; --count, ++e )  *ptr++ = *e;
        return t;
    }
    
    template< typename T > const T &put( int idx, const T &t ){
        EEPtr e = idx;
        const uint8_t *ptr = (const uint8_t*) &t;
        for( int count = sizeof(T) ; count ; --count, ++e )  (*e).update( *ptr++ );
        return t;
    }
};

static EEPROMClass EEPROM;
#endif

Compile-time macros are better than Magic Numbers, there’s no extra cost:

PIND = 1 << PIND5

Bah. I hate “PIND5”, defined as if it were some different value than PINB5, or just 5.
And it’s not really any less magic, until you give it a meaningful name like “OVERTEMP_LED”

PIND = 1<<PIND5;
PIND = 1<<PINB5;

PINB5 (Bit-5 of input port – Port-B) is not a part of PIND; yet, PIND = 1<<PINB5 gets compiled without any warning/error (Compiler warnings option is set to all) message. Can this style of coding for port-pin manipulation be recommended to include in the sketch?

PINB5 (Bit-5 of input port – Port-B) is not a part of PIND; yet, PIND = 1<<PINB5 gets compiled without any warning/error (Compiler warnings option is set to all) message. Can this style of coding for port-pin manipulation be recommended to include in the sketch?

What style? Bit shift to a location # defined behind the scenes? Seems reasonable and consistent with the Arduino style of making things simple.

The reason that your example compiles and works is that in the pin world defined in iom328p.h both PIND5 and PINB5 are #define 5 so the bitshift is 1<<5 for both.

cattledog:
What style? Bit shift to a location # defined behind the scenes? Seems reasonable and consistent with the Arduino style of making things simple.

The syntax style.

The reason that your example compiles and works is that in the pin world defined in iom328p.h both PIND5 and PINB5 are #define 5 so the bitshift is 1<<5 for both.

So, this should be the correct syntax: PIND = 1<<5;.

I find PIND = 1<<5 as readable and as hardwired as PIND = 1<<PIND5 or PIND = _BV(PIND5); or PIND = 0b00100000;

They are all IMHO a liability, technical debt you would have to maintain as it is injecting hardwired dependency throughout your code. That’s what magic numbers are about…

If you really want flexibility then the right approach IMHO is to define at the beginning of your code where things are and then use your own constants.

const uint8_t mySpecificPin = PIND5;
const uint8_t mySpecificPinMask = _BV(mySpecificPin);
volatile uint8_t* mySpecificPinRegister = &PIND;

If you wire things differently then you only need to change things once.

If for some reasons you need to know the number of leading or trailing zeros in a mask you can leverage GCC builtin (evaluated at compile time, do no code generated) functions __builtin_clz(x) and __builtin_ctz(x).