How digitalWrite works, AVR port mapping

Over the past few days, I have been digging/hacking into the way how arduino works when we are using digitalWrite to play with LEDs across different AVR ports, particularly the part on how it maps 3 different hard-to-use ports system from avr into a very elegant pin number system. Going through the core files many many times and doing my own google search, I don’t get how it works technically, but I think I got the concept, and that arduino core uses 2 arrays, maybe more, to map out the port number and pin number and turn them on one by one with digitalWrite. and I actually write my own version of digitalWrite (pinWrite) using this concept, sort of.

There are my questions:
1, can someone show me how arduino does digitalWrite? ie, Put everything in a single sketch.

2, From pins_arduino file: what do “NOT_A_PORT” and “(uint16_t) &DDRB” mean and do?

3, how is my pinWrite compare to digitalWrite in term of performance and speed. and how to improve it?

With 8Mhz clock speed
My pinWrite max speed is: 1.4khz (kinda sucks, and embarrassing)
digitalWrite max speed is: 44khz (kinda sucks too, but not bad at all)
Absolute max speed for a pin is : 800khz

/*
PD0-PD7, PB0, PB5, outputs/10 leds
*/ 

#define HIGH 1
#define LOW 0

uint16_t port=0b0000001100000000;
uint16_t pin[10]={0,1,2,3,4,5,6,7,0,5};

uint16_t pins=0b0000000000000000;

int main(void){

  DDRD=0xff;
  DDRB=0xff;

  while(1){

  // _delay_ms (500);
  //pins=0B1234567812345678  
  //pins=0b0000010101010101;     
  //pins=0b0000001101101101;  
 // pins=0b0000000000000000;

  _delay_ms (500); 
  pinWrite(0, HIGH);
  pinWrite(9, HIGH);  
  
  _delay_ms (500); 
  pinWrite(0, LOW);
  pinWrite(9, LOW);   
    }
  }
  

 int pinWrite( int8_t p_p, int8_t state){

  if( state==1) { pins |= 1<<p_p;}
  else { pins &= ~(1<<p_p);}
  
  for( int8_t c=0; c<10; c++){
    if(pins & 1<<c){
      if(port & 1<<c){ PORTB |= (1<< pin[c]);}
      else { PORTD |= (1<< pin[c]);} 
      }
    else{
      if(port & 1<<c){ PORTB &= ~(1<< pin[c]);}
      else { PORTD &= ~(1<< pin[c]);}       
      }   
  }
    
  }

There are my questions: 1, can someone show me how arduino does digitalWrite? ie, Put everything in a single sketch.

void digitalWrite(uint8_t pin, uint8_t val)
{
 uint8_t timer = digitalPinToTimer(pin);
 uint8_t bit = digitalPinToBitMask(pin);
 uint8_t port = digitalPinToPort(pin);
 volatile uint8_t *out;

 if (port == NOT_A_PIN) return;

 // If the pin that support PWM output, we need to turn it off
 // before doing a digital write.
 if (timer != NOT_ON_TIMER) turnOffPWM(timer);

 out = portOutputRegister(port);

 uint8_t oldSREG = SREG;
 cli();

 if (val == LOW) {
 *out &= ~bit;
 } else {
 *out |= bit;
 }

 SREG = oldSREG;
}

2) NOT_A_PORT is a placeholder for an invalid port, so it can catch it. &DDRB is a pointer to register DDRB (Data Direction Register, port B)

3) The looping and all that work is killing it, look how much more work it has to do, compared to the real digitalWrite code.

Those digitalPinToXXX() things just look up the pin in those lookup tables in pins_arduino.h

"(uint16_t) &DDRB" takes the address of register DDRB and casts it to an unsigned 16 bit integer.

I do not agree that, for an 8 MHz Arduino, "Absolute max speed for a pin is : 800khz". Sounds a little low. How did you come up with this?

Look for the digitalWriteFast library. That speeds things up a lot.

Here may be a link (not sure if it is the latest): https://code.google.com/archive/p/digitalwritefast/downloads

vaj4088: "(uint16_t) &DDRB" takes the address of register DDRB and casts it to an unsigned 16 bit integer.

I do not agree that, for an 8 MHz Arduino, "Absolute max speed for a pin is : 800khz". Sounds a little low. How did you come up with this?

vaj4088: "(uint16_t) &DDRB" takes the address of register DDRB and casts it to an unsigned 16 bit integer.

I do not agree that, for an 8 MHz Arduino, "Absolute max speed for a pin is : 800khz". Sounds a little low. How did you come up with this?

800khz is Not for arduino. 800khz the Absolute max speed for an i/o pin running @ 8Mhz. For an arduino, 44khz is the Absolute max speed for an i/o pin using digitalWrite.

here is how I got 44khz from an arduino @ 8Mhz. Basically a led blinker without delay, ie, maximum speed.

int led=13;
void setup() {
 pinMode(led, OUTPUT);
}

void loop() {
 digitalWrite(led, HIGH);        
 digitalWrite(led, LOW); 
}

800khz the Absolute max speed for an i/o pin running @ 8Mhz.

Nah, it's a little faster than that. Your sample sketch includes the overhead of calling and returning from loop(), and you'll go a it faster in a local loop, or if you unroll the loop. See http://forum.arduino.cc/index.php?topic=4324.0 (which all talks about a 16MHz AVR, so divide by 2...)

Basically, there are a bunch of arrays stored in flash, indexed by the pin number, that hold the values needed to do the actual bit write. (port, bitmask, otherstuff.) The function loads the values from the arrays, does some sanity checking, and then does the actual IO with the values it's pulled from the arrays. It's faster than your code because the array lookups replace your loop, and slower than direct port IO both because of the translation, and because it can't use the single-instruction bitset/bitclear opcodes when the port and bit are variables instead of constants.

Here's the code, gathered, and with additional comments. As used here, a "handle" is an arbitrary number that refers to something else, which is somewhat subtlely different than, say, the actual address of the thing. Usually shorter.

// An array in flash that is indexed by "pin number" and returns a timer "handle" (or NOT_ON_TIMER)
const uint8_t PROGMEM digital_pin_to_timer_PGM[] = {
    NOT_ON_TIMER, /* 0 - port D */
    NOT_ON_TIMER,
    NOT_ON_TIMER,
    TIMER2B,
#endif
 :
};

// And here's a macro that does the array lookup.  It's somewhat strange because of the need to
// read from flash instead of a RAM array.  And it's stranger than it needs to be (sigh.)
// (I would have written pgm_read_byte(&digital_pin_to_timer_PGM[P]) to make the array-ness clearer)
#define digitalPinToTimer(P) ( pgm_read_byte( digital_pin_to_timer_PGM + (P) ) )

// Similar array for converting pin number to bit withing the port.  Any pin on the chip is uniquely
// identified by which port it's associated with, and which bit within that port.
const uint8_t PROGMEM digital_pin_to_bit_mask_PGM[] = {
    _BV(0), /* 0, port D */
    _BV(1),
    _BV(2),
    _BV(3),
    _BV(4),
    _BV(5),
  :
};
// Similar macro for bitmask:
#define digitalPinToBitMask(P) ( pgm_read_byte( digital_pin_to_bit_mask_PGM + (P) ) )

// Similar array for pin number to port "handle."  The constants PD, PB, PC have no inherent meaning;
// they'll later be used as an index to another array.
const uint8_t PROGMEM digital_pin_to_port_PGM[] = {
    PD, /* 0 */
    PD,
    PD,
    PD,
  :
};
// Similar macro for port handle
#define digitalPinToPort(P) ( pgm_read_byte( digital_pin_to_port_PGM + (P) ) )

// Array for translating port handle to actual output port address.
// Note that generally, a port may have a different register for reading (PINx) and writing (PORTx)
const uint16_t PROGMEM port_to_output_PGM[] = {
    NOT_A_PORT,
    NOT_A_PORT,
    (uint16_t) &PORTB,
    (uint16_t) &PORTC,
    (uint16_t) &PORTD,
};
// Macro for port handle to output port address.   Note that the address is a word, rather than a byte.
#define portOutputRegister(P) ( (volatile uint8_t *)( pgm_read_word( port_to_output_PGM + (P))) )

void digitalWrite(uint8_t pin, uint8_t val)
{
    uint8_t timer = digitalPinToTimer(pin);   // Load timer used for analogWrite() (if any)
    uint8_t bit = digitalPinToBitMask(pin);   // Load bitmask for within the (8bit) port
    uint8_t port = digitalPinToPort(pin);     // Load port "handle"
    volatile uint8_t *out;

    // if the port handle says this is not an actual pin, the return without doing anything.
    if (port == NOT_A_PIN) return;

    // If the pin that support PWM output, we need to turn it off
    // before doing a digital write.
    if (timer != NOT_ON_TIMER) turnOffPWM(timer);

    out = portOutputRegister(port);   // convert the port handle into an actual port address.

    uint8_t oldSREG = SREG;   // Save current interrupt-enable state and turn off interrupts,
    cli();                    //  to prevent race conditions (an ISR modifying the same port.)

    if (val == LOW) {
        *out &= ~bit;       // For writing low, read the port, clear the bit, and re-write.
    } else {
        *out |= bit;       // for high, read, set bit, and re-write.
    }

    SREG = oldSREG;    // restore interrupt-enable state.
}

westfw: Nah, it's a little faster than that. Your sample sketch includes the overhead of calling and returning from loop(), and you'll go a it faster in a local loop, or if you unroll the loop. See http://forum.arduino.cc/index.php?topic=4324.0 (which all talks about a 16MHz AVR, so divide by 2...)

Basically, there are a bunch of arrays stored in flash, indexed by the pin number, that hold the values needed to do the actual bit write. (port, bitmask, otherstuff.) The function loads the values from the arrays, does some sanity checking, and then does the actual IO with the values it's pulled from the arrays. It's faster than your code because the array lookups replace your loop, and slower than direct port IO both because of the translation, and because it can't use the single-instruction bitset/bitclear opcodes when the port and bit are variables instead of constants.

Here's the code, gathered, and with additional comments. As used here, a "handle" is an arbitrary number that refers to something else, which is somewhat subtlely different than, say, the actual address of the thing. Usually shorter.

// An array in flash that is indexed by "pin number" and returns a timer "handle" (or NOT_ON_TIMER)
const uint8_t PROGMEM digital_pin_to_timer_PGM[] = {
 NOT_ON_TIMER, /* 0 - port D */
 NOT_ON_TIMER,
 NOT_ON_TIMER,
 TIMER2B,
#endif
 :
};

// And here's a macro that does the array lookup.  It's somewhat strange because of the need to // read from flash instead of a RAM array.  And it's stranger than it needs to be (sigh.) // (I would have written pgm_read_byte(&digital_pin_to_timer_PGM[P]) to make the array-ness clearer)

define digitalPinToTimer(P) ( pgm_read_byte( digital_pin_to_timer_PGM + (P) ) )

// Similar array for converting pin number to bit withing the port.  Any pin on the chip is uniquely // identified by which port it's associated with, and which bit within that port. const uint8_t PROGMEM digital_pin_to_bit_mask_PGM[] = { _BV(0), /* 0, port D */ _BV(1), _BV(2), _BV(3), _BV(4), _BV(5),  : }; // Similar macro for bitmask:

define digitalPinToBitMask(P) ( pgm_read_byte( digital_pin_to_bit_mask_PGM + (P) ) )

// Similar array for pin number to port "handle."  The constants PD, PB, PC have no inherent meaning; // they'll later be used as an index to another array. const uint8_t PROGMEM digital_pin_to_port_PGM[] = { PD, /* 0 */ PD, PD, PD,  : }; // Similar macro for port handle

define digitalPinToPort(P) ( pgm_read_byte( digital_pin_to_port_PGM + (P) ) )

// Array for translating port handle to actual output port address. // Note that generally, a port may have a different register for reading (PINx) and writing (PORTx) const uint16_t PROGMEM port_to_output_PGM[] = { NOT_A_PORT, NOT_A_PORT, (uint16_t) &PORTB, (uint16_t) &PORTC, (uint16_t) &PORTD, }; // Macro for port handle to output port address.   Note that the address is a word, rather than a byte.

define portOutputRegister(P) ( (volatile uint8_t *)( pgm_read_word( port_to_output_PGM + (P))) )

void digitalWrite(uint8_t pin, uint8_t val) {    uint8_t timer = digitalPinToTimer(pin);   // Load timer used for analogWrite() (if any)    uint8_t bit = digitalPinToBitMask(pin);   // Load bitmask for within the (8bit) port    uint8_t port = digitalPinToPort(pin);     // Load port "handle"    volatile uint8_t *out;

   // if the port handle says this is not an actual pin, the return without doing anything.    if (port == NOT_A_PIN) return;

   // If the pin that support PWM output, we need to turn it off    // before doing a digital write.    if (timer != NOT_ON_TIMER) turnOffPWM(timer);

   out = portOutputRegister(port);   // convert the port handle into an actual port address.

   uint8_t oldSREG = SREG;   // Save current interrupt-enable state and turn off interrupts,    cli();                    //  to prevent race conditions (an ISR modifying the same port.)

   if (val == LOW) {        *out &= ~bit;       // For writing low, read the port, clear the bit, and re-write.    } else {        *out |= bit;       // for high, read, set bit, and re-write.    }

   SREG = oldSREG;    // restore interrupt-enable state. }

Thanks, this is slightly clearer, but by no means crystal clear to me. It is still very confusing that it's done with many chains and pointers, but I think I got the concept or the programming flow completely now. It basically use Port mapping and Memory addressing to do the actual DigitalWrite, very confusing but clever indeed.

The technique for disabling the interrupt is neat, but will it even work or is it really necessary? what if there is an interrupt triggered right before " cli();"?

It basically use Port mapping and Memory addressing to do the actual DigitalWrite

Well, yes. In the end, that's the only way you CAN change a pin.

The technique for disabling the interrupt is neat, but will it even work or is it really necessary? what if there is an interrupt triggered right before " cli();"?

What we're trying to prevent is that an interrupt could occur in between reading the port value and writing the changed value back out, all of which is under cover of the "*out |= bit;" statement. This produces three assembly language instructions:

ld r, x ;;; load r from memory address in x or r, r2 ;;; OR with bitmask value st r, x ;;; store new value in port (via memory address in x)

without the cli(), an interrupt could occur after the ld or or instruction and change the pin value, which change would be lost when the st was done. It doesn't matter if there's an interrupt just before the cli(), because nothing near there changes anything important.

Writing to a PINx register toggles all port bits that are written with '1'. For a single bit at least that executes in 65 nS on a 16MHz Arduino. This way you can generate 65 nS wide signals.

  PINB = 1; // pin-8 pulsed for 65 nS
  PINB = 1;

  PINB = 4; // pin-10 pulsed for 65 nS twice
  PINB = 4;
  PINB = 4;
  PINB = 4;

you can generate 65 nS wide signals.

If the port is constant and you can use an "OUT" instruction. Otherwise you have to use "ST", which takes two cycles. (and of course you have to have set up the bitmask and (maybe) port address beforehand. However, this is how things like the NeoPixel library manage to go fast; they'll do the setup once, and then just ST instructions in the loop.)

http://playground.arduino.cc/Learning/PortManipulation