Go Down

Topic: shiftOut not very efficient (Read 6058 times) previous topic - next topic

BeBoxer

The shiftOut function isn't much more than a loop around digitalWrite. Looking at the code for digitalWrite it's clear that most of the work done in that function is identical each time it's called from shiftOut. I wrote a new version of shiftOut which essentially inlines the work from digitalWrite, and only performs the needed lookups and safety checks once. This gave me a significant improvement in how fast I could update my LED strings. I'd be happy to contribute my version back to the project but I'm not sure who to send it to. I'm sure it needs another set of eyes on it as well since I'm pretty new to Arduino programming.

Coding Badly


Start by publishing the code here and asking for a review.

Have you searched the forum for other alternatives?

BeBoxer

Thanks for the info. I'll try to get it posted this evening. I didn't look to hard for alternatives to be honest. It wasn't all that hard to write what I did.

BeBoxer

Here's the code below. It's working OK for me shifting data out to a bunch of ws2801's. I haven't tested it any beyond that though. The speedup from avoiding all the extra function calls is significant in my case though.

Code: [Select]

void fastOut(uint8_t dataPin, uint8_t clockPin, uint8_t bitOrder, uint8_t val)
{
  uint8_t i;
  uint8_t oldSREG;
  uint8_t dtimer = digitalPinToTimer(dataPin);
  uint8_t dbit = digitalPinToBitMask(dataPin);
  uint8_t dport = digitalPinToPort(dataPin);
  uint8_t ctimer = digitalPinToTimer(clockPin);
  uint8_t cbit = digitalPinToBitMask(clockPin);
  uint8_t cport = digitalPinToPort(clockPin);
  volatile uint8_t *dout;
  volatile uint8_t *cout;
 
  if (dport == NOT_A_PIN) return;
  if (cport == NOT_A_PIN) return;

  // If the pin that support PWM output, we need to turn it off
  // before doing a digital write.
  if (dtimer != NOT_ON_TIMER) turnOffPWM(dtimer);
  if (ctimer != NOT_ON_TIMER) turnOffPWM(ctimer);
  dout = portOutputRegister(dport);
  cout = portOutputRegister(cport);
  oldSREG = SREG;
  cli();
  for (i = 0; i < 8; i++)  {
    if (bitOrder == LSBFIRST) {
      if (!!(val & (1 << i)) == LOW){
        *dout &= ~dbit;
      } else {
        *dout |= dbit;
      }
    } else {
      if ((!!(val & (1 << (7 - i)))) == LOW) {
        *dout &= ~dbit;
      } else {
        *dout |= dbit;
      }
    }
// Write HIGH
    *cout |= cbit;
// Write LOW
    *cout &= ~cbit;
    SREG = oldSREG;
  }
}

Coding Badly


BeBoxer

Correct, the goal was faster output. Yes I know that SPI can be faster still, but I'm writing data to four different clock/data pairs so SPI isn't an option without additional hardware.

Coding Badly


Do you have the means to accurately measure the output rate?  Or the patience to count processor clock cycles from an assembly listing?

BeBoxer

I did take some measurements while I worked on it. I'm pushing data out to a string of 350 ws2801 LED controllers. I timed the loop below. It took about 6.5 seconds per 25 iterations with shiftOut(). With my fastOut() it dropped to about 2.1 seconds. So a three-fold speedup. I should really test it again without calling clear() to change the color to get a more accurate measurement of how long the data writes take. The code below is missing a few pieces but it should give you an idea of what I was timing.

Code: [Select]

void clear(int left, int right, long color) {
  // Turn all LEDs to the given color.
  for(int Counter=left;Counter <= right; Counter++) {
    Display[Counter].rgb=color;
  }
}

void show_one(int s)
{
  int i;
  delay(1);
  for(i=Strands[s].Start; i != Strands[s].End; i += Strands[s].Direction) {
    fastOut(Strands[s].data,Strands[s].clock,MSBFIRST,Display[i].c.red);
    fastOut(Strands[s].data,Strands[s].clock,MSBFIRST,Display[i].c.green);
    fastOut(Strands[s].data,Strands[s].clock,MSBFIRST,Display[i].c.blue);     
  }
}
void loop() {
  for (int x=0; x < 25; x++) {
    clear(300,349,0x080000);
    show_one(3);
    clear(300,349,0x000800);
    show_one(3);
  };
}

Coding Badly


Do you have fastOut defined in the sketch or in a separate source (cpp) file?

BeBoxer



Do you have fastOut defined in the sketch or in a separate source (cpp) file?



It's a separate file.

westfw

It looks pretty good to me, though I think I would have serialized the derivation of dout/dbit cout/cbit to save some stack space (perhaps the compiler manages to do this anyway?)
The other possibility is to go the "object" route, where you set up some kind of shiftOut object, and only have to compute the port/pin/timerness when the object is created, with the shiftout operations themselves being little more than the loop.
And I'd worry a bit about an optimized "shiftout" being too fast for some devices...

Coding Badly

I'm pushing data out to a string of 350 ws2801 LED controllers ... With my fastOut() it dropped to about 2.1 seconds. So a three-fold speedup


Do you have a need for a refresh rate less than 2.1 seconds or an interest in trying to speed-up your code?

BeBoxer


I'm pushing data out to a string of 350 ws2801 LED controllers ... With my fastOut() it dropped to about 2.1 seconds. So a three-fold speedup


Do you have a need for a refresh rate less than 2.1 seconds or an interest in trying to speed-up your code?



Yes, obviously I had an interest in speeding up my code which is why I looked into it. That doesn't really have much to do with whether or not the version of shiftOut() in the Arduino distribution should be more efficient though. I wrote the code because I wanted a faster version of shiftOut(). I thought others might appreciate it so I'm happy to contribute it back to the project. Having shiftOut() in the standard library implies to me as a user that it works better than the simple for loop I would write on my own. As it stands, I don't think that's the case so making it be somewhat optimized seems like the right thing to do.

pYro_65

#13
Dec 30, 2011, 08:02 am Last Edit: Dec 30, 2011, 08:21 am by pYro_65 Reason: 1
Here is my overloaded version of shiftOut you might like to try.
I also have the shiftIn version too if you are interested.

It uses fat16lib's fast output library you can get from the thread below
http://arduino.cc/forum/index.php/topic,84044.0.html

Code: [Select]

template< const uint8_t _DataPin, const uint8_t _ClockPin, const uint8_t _BitOrder >
   void shiftOut( uint8_t val )
     {
       FastDigitalIO< _DataPin > f_DataPin;
       FastDigitalIO< _ClockPin > f_ClockPin;
       
       for( uint8_t u_Index = 0 ; u_Index < 8 ; ++u_Index ){
         
         f_DataPin.write( ( _BitOrder == LSBFIRST ) ? ( !!( val & ( 1 << u_Index ) ) ) : ( !!( val & ( 1 << ( 7 - u_Index ) ) ) ) );
         f_ClockPin.write( HIGH );
         f_ClockPin.write( LOW );
       }
       return;
     }


Use like:
Code: [Select]

 FastDigitalIO< 10 > f_Storage;

 f_Storage.write( LOW );
 shiftOut< 9, 8, LSBFIRST >( B10101010 );
 f_Storage.write( HIGH );
 


The actual version I use is for a class that supports multiple shift registers of any size.

Code: [Select]

 template< const uint8_t _DataPin, const uint8_t _ClockPin, const uint8_t _BitOrder >
   void shiftOut( uint8_t val, uint16_t &bitsRemaining )
     {
       unsigned char u_BitsThisRun = ( bitsRemaining > 0x8 ) ? 0x8 : bitsRemaining;
       
       FastDigitalIO< _DataPin > f_DataPin;
       FastDigitalIO< _ClockPin > f_ClockPin;
       
       for( uint8_t u_Index = 0 ; u_Index < u_BitsThisRun ; ++u_Index ){
         
         f_DataPin.write( ( _BitOrder == LSBFIRST ) ? ( !!( val & ( 1 << u_Index ) ) ) : ( !!( val & ( 1 << ( 7 - u_Index ) ) ) ) );
         f_ClockPin.write( HIGH );
         f_ClockPin.write( LOW );
       }
       bitsRemaining -= u_BitsThisRun;
       return;
     }


Used like.

Code: [Select]
       uint16_t u_BitsRemaining = 20;
        FastDigitalIO< 10 > f_Storage;
     
       f_Storage.write( LOW );
       while( u_BitsRemaining ) shiftOut< 9, 8, LSBFIRST >( OUTPUT_DATA, u_BitsRemaining );
       f_Storage.write( HIGH );




CrossRoads

Its amazing the hoops you guys will jump thru to avoid using 1 external logic chip!
Even when it can save you IO pins.
Designing & building electrical circuits for over 25 years.  Screw Shield for Mega/Due/Uno,  Bobuino with ATMega1284P, & other '328P & '1284P creations & offerings at  my website.

Go Up