Fast alternative to digitalRead/digitalWrite

Even for pin groups the overhead of combining pin access often is slower and takes more code.

Here are some examples (C++ statement followed by generated code):

To write one bit for ports A-G sbi/cbi is the winner:

  PORTB |= 0X1;
   c:   28 9a           sbi     0x05, 0 ; 5

With two or more pins, combining bits requires more instructions. You also need a cli/sei to make it atomic for general use.

  cli();
   c:   f8 94           cli
  PORTB |= 0X11;
   e:   85 b1           in      r24, 0x05       ; 5
  10:   81 61           ori     r24, 0x11       ; 17
  12:   85 b9           out     0x05, r24       ; 5
  sei();
  14:   78 94           sei

So it is hard to save time or code by combining bits. You do get all bits changing state at the same time.

The best plan for a pinGroup is to dedicate an entire port so you don't need to OR or AND bits and worry about atomic operations. That's why I think a DigitalPort class is best.

For Mega ports H, J, and K cbi/sbi can't be used since the port address is too large. Setting a single bit in these port is slow:

  cli();
   c:   f8 94           cli
  PORTH |= 0X1;
   e:   e2 e0           ldi     r30, 0x02       ; 2
  10:   f1 e0           ldi     r31, 0x01       ; 1
  12:   80 81           ld      r24, Z
  14:   81 60           ori     r24, 0x01       ; 1
  16:   80 83           st      Z, r24
  sei();
  18:   78 94           sei

On a Mega a PinGroup could become quite large so a pingroup should have its max size as param:

I would suggest limiting to 8 pins anyway.

Furthermore must it set pins of the same register simultaneously? If pins are in different registers this is not possible ...

Not necessarily, if the fact that pins are on the same port can be detected great, but even if behind the scenes it degenerates to a stack of single pin writes (as you show) at least the application code will be simpler and more readable.

You could write a templates for a given number of pins. Not so neat but works.
TwoPinGroup<Pin0, Pin1>
ThreePinGroup<Pin0, Pin1, Pin2>

I'm not strong on C++ but can't you have 8 constructors with different numbers of parms, that way there is only a single pinGroup object and the syntax is the same up to 8 pins.

As for simultaneous writes, it would be nice if the class auto detected pins on the same port but I don't think that's really important, maybe a second Port class that boils down to
simple "PORTx =" code with .bitSet() and .bitClear() methods that just do "PORTx != val" etc. At least that will add to the current HAL and isolate beginners from such "complex" ideas.

OTOH if all this can be rolled into a single class even better.


Rob

As for simultaneous writes, it would be nice if the class auto detected pins on the same port

The only purpose for my WriteMany class it to write like pins. A convenience factor is not really on my list at all.

imho a pingroup would have an internal collection to which runtime pins can be added and removed (don't know the purpose for remove yet)
The collection is not sorted, so the adding order applies.

Also pins probably should not be runtime, assigning runtime pins more than once doesn't really make sense unless you are physically re-wiring your hardware while the Arduino is on.

Also they are not usable values with digitalPin library and will have to resort to some slower lookup table version. making it more efficient to just individually write the pins.

Non-type template parameters also have no storage overhead, no SRAM is used to store the parameters past compilation as the compiled code is completely customised to those parameters. The alternative is a generic read/write that must look up the contents with every operation.

My code as tested for 3 and 4 pins produces less instructions on like pins rather than doing an individual write on each pin. When I finish the 4 & 5 pin writer I'll post it.

I'm not limiting this code to 8 pins though, The benefits my HAL will theoretically receive from writing any number of pins out ways this limitation by far.

Writing multiple pins seems like a good idea, at least in the abstract. There are cases where dedicating an entire 8-bit port to a device makes sense but this is not write multiple pins.

I have written a lot of bit-bang code for SPI, I2C, and various devices. When I get to real hardware, my abstract write multiple ideas never seem to help.

Does anyone have a situation with real hardware where an existing implementation would be improved by write multiple with three or four pins. The pins must be restricted to a single port.

The best example I have is something like an LCD display. In this case the restriction that all pins are on the same port is too severe. The library LiquidCrystal allows any pins and that doesn't add much complication. Here are the byte and nibble write functions.

void LiquidCrystal::write4bits(uint8_t value) {
  for (int i = 0; i < 4; i++) {
    pinMode(_data_pins[i], OUTPUT);
    digitalWrite(_data_pins[i], (value >> i) & 0x01);
  }
  pulseEnable();
}

void LiquidCrystal::write8bits(uint8_t value) {
  for (int i = 0; i < 8; i++) {
    pinMode(_data_pins[i], OUTPUT);
    digitalWrite(_data_pins[i], (value >> i) & 0x01);
  }
  pulseEnable();
}

Note this code has pinMode in the write function. LCD displays can be written or read so your write multiple should also support read.

It's the details of real complex devices that seems to kill the advantages of a library for accessing multiple pins.

I was very interested in this because I'm writing for dedicated hardware with my LCD data pins contiguous and on the same port. With a simple benchmark just converting the stock LiquidCrystal library to DigitalPin and nothing else I saw a 32% speed up. By changing the write4bits method to shift the nibble directly into the port I only saw an additional 1.1% speed increase from the pure DigitalPin version.

Unless I totally mangled my direct port code, which is a very real possibility

PORTC = (PORTC & (~B00111100)) | ((value << 2) & B00111100);// D0-3 on A2-A5

When I removed the section setting the pins to output in each write the difference between digitalPin and direct port was only .06%

Is that basically what you're getting at or did I miss the point entirely?

You got the point exactly.

Often what looks great in C/C++ code doesn't optimize well for I/O on AVR chips.

avr-gcc seems to really understand single bit operations. Sometimes it does really stupid things with more complex cases.

I am now doing a very general bit-bang SPI implementation for all SPI modes. I get a factor of four speedup by simple changes that make the compile so what I expect.

@fat16lib, I have recently converted my older code using FastDigitalIO to your newer DigitalPin library.
The compilation size grew by two bytes, I cannot find the reason why either ( 2 bytes is nothing anyway ).
Also I noticed the 'mode()' function is gone, Was easier to implement in some circumstances.

I will be posting another version of digitalRead/digitalWrite. The current version is not working well in a general implementation of software SPI master. I am also using it for a fast software I2C master.

I tested this library on my Uno against using digital.Write and I got a nice improvement in speed. I also thought your library was easy to use once I caught on to terms needed to activate the pins.

Can this library be used with any Arduino compatible board or do the pins have to be defined in the library first?

To narrow my question down, I would like to use it on a Leonardo board and "here comes a dream", try to use the library on a Maple.

Great work, thank you for your efforts!

I have defined pins for 168/328, Mega, 644/1284 (Sanguino style), Leonardo, Teensy, and Teensy++. I haven't tested Leonardo yet.

I am currently doing a lot of development on STM32 and have been thinking about digital I/O for STM32. STM32 is very different than AVR. Pins have 15 modes. and there are neat registers to safely set and clear bits in a port.

Here are the modes:

STM32 pin modes
0 - Analog input.
1 - Push Pull output 10MHz.
2 - Push Pull output 2MHz.
3 - Push Pull output 50MHz.
4 - Digital input.
5 - Open Drain output 10MHz.
6 - Open Drain output 2MHz.
7 - Open Drain output 50MHz.
8 - Digital input with PullUp or PullDown resistor depending on ODR.
9 - Alternate Push Pull output 10MHz.
A - Alternate Push Pull output 2MHz.
B - Alternate Push Pull output 50MHz.
C - Reserved.
D - Alternate Open Drain output 10MHz.
E - Alternate Open Drain output 2MHz.
F - Alternate Open Drain output 50MHz.

Wow that is a lot of choices! I can see how that would take some time to work out!

Are you posting on a forum related to ARM chips, so that I can follow your work?

If you include both SdFat and DigitalPin in the same program you get a redefinition of 'struct pin_map_t' error

I know the beta has problems like the 'struct pin_map_t' error.

I am not happy with the beta and have totally restructured this library but have not had time to finish the new library.

The old version did not work well for a number of applications like fast software I2C and SPI. The new library is very different and won't be backward compatible.

const static pin_map_t pinMap[] = {
  {&DDRD, &PIND, &PORTD, 0},  // D0  0
  {&DDRD, &PIND, &PORTD, 1},  // D1  1
  {&DDRD, &PIND, &PORTD, 2},  // D2  2
  {&DDRD, &PIND, &PORTD, 3},  // D3  3
  {&DDRD, &PIND, &PORTD, 4},  // D4  4
  {&DDRD, &PIND, &PORTD, 5},  // D5  5

The compiler optimizes const static array accesses to constants? Wow! That makes the earlier attempts at digitalWriteFast look ... silly.

Hmm. With a SLIGHT amount of cooperation from the Arduino core team, the pins_arduino.h file(s) could be set up to generate either the existing PROGMEM tables OR const static arrays, so that "fast" versions of things could be written using exactly the same data used for the slow versions:

const MAYBE_STATIC uint8_t MAYBE_PROGMEM digital_pin_to_port_[] =
{
	PB, /* 0 */
	PB,
	PB,
	PB,
	PB,
//etc

// with:
#define MAYBE_STATIC static
#define MAYBE_PROGMEM
#include <pins_arduino.h>

Or...

#define digital_pin_to_port_M \
{ \
	PB, /* 0 */ \
	PB,\
	PB,\
	PB,\
	PB,\ 
//etc \
}

const uint8_t PROGMEM digital_pin_to_port_PGM[] = digital_pin_to_port_M;
static const uint8_t digital_pin_to_port_S[] = digital_pin_to_port_M;

This is extremely interesting, any improvements?

I can't improve the speed since functions like pin.high() and pin.low() compile to a single sbi or cbi instruction for low address I/O ports. All ports on the 328 and ports A-G on the Mega are low address. These instructions execute in 2 cycles or 125 ns on a 16 MHz cpu.

I have added software SPI which runs at about 2 MHz. This library supports all SPI modes for MSB first. It would be easy to add an option for LSB first.

I have not posted the latest version as a standalone library. The latest version of DigitalPin with SoftSPI is used in the new 20120719 version of SdFat. The files DigitalPin.h and SoftSPI.h are in the SdFat/utility folder and SdFat is here Google Code Archive - Long-term storage for Google Code Project Hosting..

I have also written a software I2C library based on the DigitalPin library that runs at 400 kHz. I plan to post this I2C library soon.

I can't improve the speed since functions like pin.high() and pin.low() compile to a single sbi or cbi instruction for low address I/O ports. All ports on the 328 and ports A-G on the Mega are low address. These instructions execute in 2 cycles or 125 ns on a 16 MHz cpu....

Btw this is a great result! fastDigitalWrite it's still one of the most used library because the original digitalWrite it's silly slow for many applications but the development it's stuck at 2010 and not provide any support for new micros and with the new pin management of arduino I dubt it will be useful as in the past.
Man, you made a great work! I know that direct port manipulation it's easier but the beauty of change processor and reuse the libraries for experiments without spend hours around PORTwhatever it's a dream!
I will check how you used in your sdfatlib, I'm planning to apply to liquidCrystal lib (I hope will not a nightmare...) and was really great you added any unsupported processor.

Oh. You had same problem as I do. Arduino is slow... Well I had another problem. I can't create easy libraries with bare avr c. So I started project to overcome this problem.

As a result I have very nice implementation for digital pins. That is only thing really working yet. Timers and analogRead is next.

Wanted to let you know what I have found out, so here is the project: GitHub - raphendyr/yaal: Yet another AVR Abstraction Library

raphendyr:
Oh. You had same problem as I do. Arduino is slow... Well I had another problem. I can't create easy libraries with bare avr c. So I started project to overcome this problem.

As a result I have very nice implementation for digital pins. That is only thing really working yet. Timers and analogRead is next.

Wanted to let you know what I have found out, so here is the project: GitHub - raphendyr/yaal: Yet another AVR Abstraction Library

Hi, had a look at your library. Seems you are doing similar things to the ideas I'm implementing in my own library. I also noticed you are using a very basic version of my AtomicBlock library. I'm about to release a new version compatible with AVR, AVR32, PIC32, ARM Cortex M/R if you are interested.