Pages: [1] 2   Go Down
Author Topic: Fast alternative to digitalRead/digitalWrite  (Read 7534 times)
0 Members and 1 Guest are viewing this topic.
0
Offline Offline
Edison Member
*
Karma: 63
Posts: 1603
Arduino rocks
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

I have developed a new C++ library for fast digital I/O and would appreciate any comments and suggestions.  The library is posted here code.google.com/p/beta-lib/downloads/list as the file DigitalPinBeta20120113.zip.

A number of people have developed fast versions of digitalRead/digitalWrite using C macros to generate fast inline code.

I decided to to design a new API and use a template class.  Here is an example program that generates two 125 ns wide pulses for a scope timing test, assuming a 16 MHz CPU.

Code:
// scope test for write timing
#include <DigitalPin.h>

// class with compile time pin number
DigitalPin<13> pin13;

void setup() {
  // set mode to OUTPUT
  pin13.outputMode();
}
void loop() {
  pin13.high();
  pin13.low();
  pin13.high();
  pin13.low();
  delay(1);
}

Each of the statements, pin13.outputMode(), pin13.low(), and pin13.high(), compiles into a two byte cbi/sbi instruction.

For high address Mega pins the functions are larger and slower but provide atomic access to the pins.
Logged

North Queensland, Australia
Offline Offline
Edison Member
*
Karma: 64
Posts: 2102
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Hi, nice work.

I should have waited a few weeks... Your previous FastDigitalIO class allowed me to scrap my re-invention of a wheel. But I recently added this same functionality and moved the files into one... Anyway your new code is nice n tidy so I might put mine in the archive now.

As a c++ programmer, macros as long as some of the port map versions are just a headache. As i'm writing a template HAL library, this code almost seems custom written for me smiley I have a system I'm deriving off parts of this to allow my HAL to write pins on like ports in one operation. I will post it when done if you would like a look.

« Last Edit: January 14, 2012, 05:05:11 am by pYro_65 » Logged


nr Bundaberg, Australia
Offline Offline
Tesla Member
***
Karma: 126
Posts: 8471
Scattered showers my arse -- Noah, 2348BC.
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Looks like nice code to my (mostly hardware) eyes, and I far prefer the OO syntax.

Now I have a request, how about a "pingroup" class where you define a random selection of pins and can then apply a value to them. Maybe limited to 8 bits eg

Code:
PinGroup myPG (1,3,5,6,7,9,23,24);  // -1 for unused pins or 8 constructors ?

myPG.set(0xF5);

for (int i = 0; i < 100; i++) myPG.set(i);

This may or may not look very clean under the covers but would be a lot better than the

Code:
digitalWrite (1, HIGH);
digitalWrite (2, HIGH);
digitalWrite (3, LOW);
digitalWrite (4, HIGH);
digitalWrite (7, LOW);
digitalWrite (9, LOW);
digitalWrite (23, HIGH);
digitalWrite (24, HIGH);

That we currently have.

______
Rob
« Last Edit: January 14, 2012, 06:17:10 am by Graynomad » Logged

Rob Gray aka the GRAYnomad www.robgray.com

North Queensland, Australia
Offline Offline
Edison Member
*
Karma: 64
Posts: 2102
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
Now I have a request, how about a "pingroup" class where you define a random selection of pins and can then apply a value to them. Maybe limited to 8 bits eg

I too see the value in something implementing those ideas.

That is essentially what I have started making.

I have a class 'WriteMany' specialised for up to 8 template paramaters, using compile time logic it determines which pins are on like ports and writes them together, any unique pins generate FastDigitalIO/digitalPin write methods. I have hit a small block as I rethink the port grouping logic, after specialising a four pin write I noticed how the next 4 specialisations will have quite a bit of code to them and would increase compile time dramatically. Not good as I plan to have it handling 69 pins.





Logged


Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 212
Posts: 13531
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Code:
PinGroup myPG (1,3,5,6,7,9,23,24);  // -1 for unused pins or 8 constructors ?
myPG.set(0xF5);
for (int i = 0; i < 100; i++) myPG.set(i);
On a Mega a PinGroup could become quite large so a pingroup should have its max size as param:  PinGroup <size> myPG;

Furthermore must it set pins of the same register simultaneously? If pins are in different registers this is not possible ...

Code:
Internally I would keep it simple, something like :

myPG.set(int val)
{
  for (uint8_t i=0; i<size; i++)
  {
    if (pin[i] >= 0) digitalWrite(pin[i], bitset(val,i);  // -1 in a group just means skip this pin
  }
}

Collecting registers and setting them at once would cause extra code so I doubt if it is faster

Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

North Queensland, Australia
Offline Offline
Edison Member
*
Karma: 64
Posts: 2102
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
Code:
PinGroup myPG (1,3,5,6,7,9,23,24);  // -1 for unused pins or 8 constructors ?

Unfortunatly the pins have to be defined in template parameters too,
If you use a formal parameter in a non-type template specification you will get an error ( XXX cannot appear in a constant-expression ). Meaning the DigitalPin library can not be used this way.

I'm overcoming this with a few macros to combine the pin numbers into one data block. So each template parameter contains a number of pins. I still have testing to see if large data types are compatible ( 64-bit integer ), should be as they are implemented by the compiler rather than the arduino.

A pin grouping system is definitely a task I would like to utilise and help create if needed.

EDIT: this task seems to be hindered by the arduino ide itself, if it supported c++0x ( or whatever the new standard is ) variadic templates would be perfect for this situation

« Last Edit: January 14, 2012, 09:21:30 am by pYro_65 » Logged


0
Offline Offline
Edison Member
*
Karma: 63
Posts: 1603
Arduino rocks
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

I have played with multiple pins for timing tests.  Scope tests with the following five pin example show that a single call to writeGroup() takes 2.5 microseconds.  That is faster than a call to digitalWrite() for a single pin.

It takes 80 microseconds for the loop to go through all 32 possible values for five pins.

Code:
#include <DigitalPin.h>
DigitalPin<3>  pin0; // bit 0X01
DigitalPin<9>  pin1; // bit 0X02
DigitalPin<7>  pin2; // bit 0X04
DigitalPin<5>  pin3; // bit 0X08
DigitalPin<13> pin4; // bit 0X10

void initGroup () {
  pin0.outputMode();
  pin1.outputMode();
  pin2.outputMode();
  pin3.outputMode();
  pin4.outputMode(); 
}
void writeGroup(uint8_t val) {
  pin0.write(1  & val);
  pin1.write(2  & val);
  pin2.write(4  & val);
  pin3.write(8  & val);
  pin4.write(16 & val);
}

void setup() {
  initGroup();
}

void loop() {
  for (uint8_t i = 0; i < 32; i++) {
    writeGroup(i);
  }
}
The sketch take 552 bytes of flash on Arduino 1.0.  The "empty" sketch
Code:
void setup() {}
void loop() {}
takes 466 bytes of flash so that is only 86 additional bytes.

You could write a templates for a given number of pins.  Not so neat but works.

TwoPinGroup<Pin0, Pin1>
ThreePinGroup<Pin0, Pin1, Pin2>
...
EightPinGroup<Pin0, Pin1, Pin2, Pin3, Pin4, Pin5, Pin6, Pin7>

I thought of a DigitalPort class for multiple bits on one port.  Trying to combine bits that are on the same port in PinGroup has a very high overhead since you can't arrange for the compiler to optimize to efficient I/O instructions.

Here is an idea for a read group:
Code:
uint8_t readGroup() {
  uint8_t value = 0;
  if (pin0.read()) value |= 1;
  if (pin1.read()) value |= 2;
  if (pin2.read()) value |= 4;
  if (pin3.read()) value |= 8;
  if (pin4.read()) value |= 16;
  return value;
}
This function is small, less than 40 bytes of flash.  I haven't timed it.
Logged

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 212
Posts: 13531
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

imho a pingroup would have an internal collection to which runtime pins can be added and removed (don't know the purpose for remove yet)
The collection is not sorted, so the adding order applies.

Code:
pinGroup <4> PG;  // size = 4

PG.add(2);    // or PG.add(pin2);
PG.add(13);
PG.add(6);
PG.add(8);  // internal array { 2,13,6,8 }; ActualSize = 4; maxSize = 4;

int x = PG.readGroup();  // auto outputMode
PG.writeGroup(5);         // and inputMode?   pin2 = 0 pin13 = 1 pin 6 = 0 pin 8 = 1
PG.write(HIGH);            // all pins HIGH

PG.remove(6); // internal array { 2,13,8 }; ActualSize = 3; maxSize = 4;
PG.add(6);      // internal array { 2,13,8,6 }; ActualSize = 4; maxSize = 4;  ==> last 2 lines swapped !

Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Guildford, UK
Offline Offline
Full Member
***
Karma: 0
Posts: 217
Arduino rocks
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Have you looked at what's been done in v3 GLCD (http://code.google.com/p/glcd-arduino/downloads/list)? There's some clever code in there to optimize port access.

Iain
Logged

0
Offline Offline
Edison Member
*
Karma: 63
Posts: 1603
Arduino rocks
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Yes, I have looked at v3 GLCD and it contains the kind of macros I am trying to avoid.

DigitalPin generates the fastest and smallest possible code for reading and writing I/O port on 328 Arduinos and ports A-G on the Mega.

For high() this is a single sbi instruction and for low() a single cbi instruction.  These instructions execute in two cycles.

This sketch only uses 14 bytes more than an empty sketch:
Code:
// read pin 12 write value to pin 13
#include <DigitalPin.h>

DigitalPin<12> readPin;
DigitalPin<13> writePin;

void setup() {
  readPin.inputMode();
  writePin.outputMode();
}
void loop() {
  writePin.write(readPin.read());
}


Logged

Guildford, UK
Offline Offline
Full Member
***
Karma: 0
Posts: 217
Arduino rocks
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Yes, I have looked at v3 GLCD and it contains the kind of macros I am trying to avoid.

DigitalPin generates the fastest and smallest possible code for reading and writing I/O port on 328 Arduinos and ports A-G on the Mega.
Sorry I don't think I made myself clear. I was referring to the discussion on pin groups. v3 GLCD has some clever code for recognising groups of pins are on the same port and generating faster code than accessing pins individually.

I think the DigitalPin template will be really useful but at present where speed is important I'm accessing the ports directly.

BTW why is access to ports H+ on the Mega slower?

Iain
Logged

0
Offline Offline
Edison Member
*
Karma: 63
Posts: 1603
Arduino rocks
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Even for pin groups the overhead of combining pin access often is slower and takes more code.

Here are some examples (C++ statement followed by generated code):

To write one bit for ports A-G sbi/cbi is the winner:
Code:
  PORTB |= 0X1;
   c:   28 9a           sbi     0x05, 0 ; 5

With two or more pins, combining bits requires more instructions.  You also need a cli/sei to make it atomic for general use.
Code:
  cli();
   c:   f8 94           cli
  PORTB |= 0X11;
   e:   85 b1           in      r24, 0x05       ; 5
  10:   81 61           ori     r24, 0x11       ; 17
  12:   85 b9           out     0x05, r24       ; 5
  sei();
  14:   78 94           sei
So it is hard to save time or code by combining bits.  You do get all bits changing state at the same time.

The best plan for a pinGroup is to dedicate an entire port so you don't need to OR or AND bits and worry about atomic operations.  That's why I think a DigitalPort class is best.

For Mega ports H, J, and K cbi/sbi can't be used since the port address is too large.  Setting a single bit in these port is slow:
Code:
  cli();
   c:   f8 94           cli
  PORTH |= 0X1;
   e:   e2 e0           ldi     r30, 0x02       ; 2
  10:   f1 e0           ldi     r31, 0x01       ; 1
  12:   80 81           ld      r24, Z
  14:   81 60           ori     r24, 0x01       ; 1
  16:   80 83           st      Z, r24
  sei();
  18:   78 94           sei

Logged

nr Bundaberg, Australia
Offline Offline
Tesla Member
***
Karma: 126
Posts: 8471
Scattered showers my arse -- Noah, 2348BC.
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
On a Mega a PinGroup could become quite large so a pingroup should have its max size as param:
I would suggest limiting to 8 pins anyway.

Quote
Furthermore must it set pins of the same register simultaneously? If pins are in different registers this is not possible ...
Not necessarily, if the fact that pins are on the same port can be detected great, but even if behind the scenes it degenerates to a stack of single pin writes (as you show) at least the application code will be simpler and more readable.

Quote
You could write a templates for a given number of pins.  Not so neat but works.
TwoPinGroup<Pin0, Pin1>
ThreePinGroup<Pin0, Pin1, Pin2>
I'm not strong on C++ but can't you have 8 constructors with different numbers of parms, that way there is only a single pinGroup object and the syntax is the same up to 8 pins.

As for simultaneous writes, it would be nice if the class auto detected pins on the same port but I don't think that's really important, maybe a second Port class that boils down to
simple "PORTx =" code with .bitSet() and .bitClear() methods that just do "PORTx != val" etc. At least that will add to the current HAL and isolate beginners from such "complex" ideas.

OTOH if all this can be rolled into a single class even better.


______
Rob
« Last Edit: January 14, 2012, 04:59:51 pm by Graynomad » Logged

Rob Gray aka the GRAYnomad www.robgray.com

North Queensland, Australia
Offline Offline
Edison Member
*
Karma: 64
Posts: 2102
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
As for simultaneous writes, it would be nice if the class auto detected pins on the same port

The only purpose for my WriteMany class it to write like pins. A convenience factor is not really on my list at all.

Quote
imho a pingroup would have an internal collection to which runtime pins can be added and removed (don't know the purpose for remove yet)
The collection is not sorted, so the adding order applies.

Also pins probably should not be runtime, assigning runtime pins more than once doesn't really make sense unless you are physically re-wiring your hardware while the Arduino is on.

Also they are not usable values with digitalPin library and will have to resort to some slower lookup table version. making it more efficient to just individually write the pins.

Non-type template parameters also have no storage overhead, no SRAM is used to store the parameters past compilation as the compiled code is completely customised to those parameters. The alternative is a generic read/write that must look up the contents with every operation.

My code as tested for 3 and 4 pins produces less instructions on like pins rather than doing an individual write on each pin. When I finish the 4 & 5 pin writer I'll post it.

I'm not limiting this code to 8 pins though, The benefits my HAL will theoretically receive from writing any number of pins out ways this limitation by far.
Logged


0
Offline Offline
Edison Member
*
Karma: 63
Posts: 1603
Arduino rocks
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Writing multiple pins seems like a good idea, at least in the abstract.  There are cases where dedicating an entire 8-bit port to a device makes sense but this is not write multiple pins.

I have written a lot of bit-bang code for SPI, I2C, and various devices.  When I get to real hardware, my abstract write multiple ideas never seem to help.

Does anyone have a situation with real hardware where an existing implementation would be improved by write multiple with three or four pins.  The pins must be restricted to a single port.

The best example I have is something like an LCD display.  In this case the restriction that all pins are on the same port is too severe.  The library LiquidCrystal allows any pins and that doesn't add much complication.  Here are the byte and nibble write functions.

Code:
void LiquidCrystal::write4bits(uint8_t value) {
  for (int i = 0; i < 4; i++) {
    pinMode(_data_pins[i], OUTPUT);
    digitalWrite(_data_pins[i], (value >> i) & 0x01);
  }
  pulseEnable();
}

void LiquidCrystal::write8bits(uint8_t value) {
  for (int i = 0; i < 8; i++) {
    pinMode(_data_pins[i], OUTPUT);
    digitalWrite(_data_pins[i], (value >> i) & 0x01);
  }
  pulseEnable();
}
Note this code has pinMode in the write function.  LCD displays can be written or read so your write multiple should also support read.

It's the details of real complex devices that seems to kill the advantages of a library for accessing multiple pins.
Logged

Pages: [1] 2   Go Up
Jump to: