Bit shifting loop slows down as the bits shift out

I'm writing yet another software SPI library and it's nearly ready for beta testing but I notice the function gets slower as the bits are shifted out. The 1/2 clock period starts at 3uS for the first bit and gets progressively slower to about 5.2uS as the 8th bit is clocked out. Now I think this is because the if (data & (1 << i)){ has to shift more times as the loop progresses but that's just a guess. Assuming this is the problem can someone suggest a better way to maintain a fast and steady clock without resorting to machine code that would make the library less portable.

uint8_t SPIsoft::highCPHA(uint8_t data){
  
  uint8_t i, tmpByte;
  
  for (i = 0; i < 8; i++){                      // 8 bits to transfer
    // Load data onto MOSI
    if (data & (1 << i)){
      *_mosiPort |= _mosiMask;                  // Set mosi pin
    }
    else{
      *_mosiPort &= ~_mosiMask;                 // Clear mosi pin
    }
    delayMicroseconds(_clkDelay);               // Delay to allow slave time to load bit
    // Toggle clock pin
    *_clkPort ^= _clkMask;                      // Toggle clock pin
    delayMicroseconds(_clkDelay);               // Delay
    // Read data from MISO
    if (*_misoPort & _misoMask){                // Read the miso port bit
      tmpByte |= (1 << i);
    }
    else{
      tmpByte &= ~(1 << i);
    }
    // Toggle clock pin
    *_clkPort ^= _clkMask;                      // Toggle clock pin
  }
  return tmpByte;
}

Replace the :-

if (data & (1 << i))

with:-

// start with a value of
mask = 1;
// then
if (data & mask)
// followed by
mask = mask << 1;

Thanks Mike,
That works a treat.

uint8_t SPIsoft::highCPHA(uint8_t data){
  
  uint8_t i, tmpByte;
  
  for (i = 1; i > 0; i = i << 1){               // 8 bits to transfer
    // Load data onto MOSI
    if (data & i){
      *_mosiPort |= _mosiMask;                  // Set mosi pin
    }
    else{
      *_mosiPort &= ~_mosiMask;                 // Clear mosi pin
    }
    delayMicroseconds(_clkDelay);               // Delay to allow slave time to load bit
    // Toggle clock pin
    *_clkPort ^= _clkMask;                      // Toggle clock pin
    delayMicroseconds(_clkDelay);               // Delay
    // Read data from MISO
    if (*_misoPort & _misoMask){                // Read the miso port bit
      tmpByte |= i;
    }
    else{
      tmpByte &= ~i;
    }
    // Toggle clock pin
    *_clkPort ^= _clkMask;                      // Toggle clock pin
  }
  return tmpByte;
}

or shift a copy of the data

  copy = data;
  for (i = 0; i < 8; i++)
  {                    
    if (copy & 1 ) 
    {
      *_mosiPort |= _mosiMask;  // 
    }
    else
    {
      *_mosiPort &= ~_mosiMask;
    }
    copy = copy >> 1; // shift the data 

   ///  delayMicroseconds(_clkDelay);  <<< not needed

    *_clkPort ^= _clkMask;                      // Toggle clock pin

    delayMicroseconds(_clkDelay);

    if (*_misoPort & _misoMask)          // Read the miso port bit
    {                
      tmpByte |= (1 << i);
    }
    else{
      tmpByte &= ~(1 << i);
    }
    // Toggle clock pin
    *_clkPort ^= _clkMask;                      // Toggle clock pin
  }
  return tmpByte;
}

Can you measure the timing?

robtillaart:
or shift a copy of the data

Can you measure the timing?

The timing is pretty much the same as the last code I posted (2.8uS clock high, 3.1uS clock low) I need to keep the delay you remmed out in else slower clock speeds give an uneven clock duty cycle.
Thanks for the suggestion though.

Is this also true with following ShiftOut function?

void shiftOut(int myDataPin, int myClockPin, byte myDataOut) {
  // This shifts 8 bits out MSB first, 
  //on the rising edge of the clock,
  //clock idles low

  //internal function setup
  int i=0;
  int pinState;
  pinMode(myClockPin, OUTPUT);
  pinMode(myDataPin, OUTPUT);

  //clear everything out just in case to
  //prepare shift register for bit shifting
  digitalWrite(myDataPin, 0);
  digitalWrite(myClockPin, 0);

  //for each bit in the byte myDataOut�
  //NOTICE THAT WE ARE COUNTING DOWN in our for loop
  //This means that %00000001 or "1" will go through such
  //that it will be pin Q0 that lights. 
  for (i=7; i>=0; i--)  {
    digitalWrite(myClockPin, 0);

    //if the value passed to myDataOut and a bitmask result 
    // true then... so if we are at i=6 and our value is
    // %11010100 it would the code compares it to %01000000 
    // and proceeds to set pinState to 1.
    if ( myDataOut & (1<<i) ) {
      pinState= 1;
    }
    else {	
      pinState= 0;
    }

    //Sets the pin to HIGH or LOW depending on pinState
    digitalWrite(myDataPin, pinState);
    //register shifts bits on upstroke of clock pin  
    digitalWrite(myClockPin, 1);
    //zero the data pin after shift to prevent bleed through
    digitalWrite(myDataPin, 0);
  }

  //stop shifting
  digitalWrite(myClockPin, 0);
  // delay(0);
}

i mean when i run my graphical waterfall for long time i see delay of valve opening in long run...

Khalid:
Is this also true with following ShiftOut function?

void shiftOut(int myDataPin, int myClockPin, byte myDataOut) {

// This shifts 8 bits out MSB first,
 //on the rising edge of the clock,
 //clock idles low

}




i mean when i run my graphical waterfall for long time i see delay of valve opening in long run...

How does your shiftOut procedure differ from the built in version?
For me the digitalWrite would be to slow and ShiftOut is no use as I have to clock out and read back data at the same time.

I am using the BUILT IN version (i am poor in programming).. I am amazed to see your programming skill and was attentive to read that you are using your own customized SHIFOUT function.
Regards

PS:
How you programm such a difficult function...:frowning: How long it took you to write the above function? :grin:

some time ago I played with a FastShiftOut function. It is about twice as fast as builtin, while there are still things to improve in the experimental code. See - http://forum.arduino.cc/index.php/topic,184002.0.html -

Thanks for the link:).. I shall check it out how can i use it in place of SHIFTOUT built-in function... However, any help in this is appreciated :)..
Should i make Header file and use that function in my sketch in similar way the built-in function been used?
Regards

Khalid:
I am using the BUILT IN version (i am poor in programming).. I am amazed to see your programming skill and was attentive to read that you are using your own customized SHIFOUT function.
Thanks :blush: I have only been programming in C++ for about 18 months now. For about 30 years prior to that I have mostly written in Basic, Machine code (Various processors & PICs) & ARexx (Amiga). I still find some C++ difficult (pointer syntax especially) as I tend to visualize what I want in Assembler or Basic and then try to convert to C++.

PS:
How you programm such a difficult function...:frowning: How long it took you to write the above function? :grin:
I originally wrote the basic idea using digitalWrite/digitalRead but found they were to slow so decided to use direct port manipulation but in a way it should be portable across the arduino range. Finding out how and where to get the information to do that has taken about 7 days on and off and testing another 3 days (though I lack devices to properly test with).

Riva... Thanks for the information and i am really impressed. I am sorry for taking your thread a little astray or you call it hijacked.
You programmers are really talented and genius.. I can't think of writing such difficult sketches... I do programming just to achieve the goal (you can say i am bad practitioner in programming) and my sketches are worst and non-optimized... i wish i have a brain like you peoples..
Regards

Its practice & experience - the best way to learn is to read well-written code.

Incidentally the reason the << and >> operators are slower with larger shifts is that
the AVR microcontroller is very simple - its lacks a (16 or 32 bit) "barrel shifter", the
hardware present in more capable processors to shift by any amount in a single cycle.

The AVR has to repeat a one bit shift, taking more cycles, and its sometimes, I believe,
faster to multiply by 2^N than to shift left by N (is this actually the case? I thought the
multiply instruction was 3 cycles, but I may be dreaming). Of course being an 8-bit
CPU is the major disadvantage, many things take a lot of cycles when using int or long
(compared to byte or char)

For instance it is, I think, faster for short loops to use a byte or char loop variable:

  for (byte i = 0 ; i < 10 ; i++)
    ....

Perhaps I'll go and check this.

Mark.. How then you handle large bit shifting without compromising speed. For example if you were in my place andyou are dealing with continuous bit shifting to run 80 solenoid valves with 10 shift registers using Arduino.
Look the video of my following program written in visual Basic.
www.youtube.com/watch?v=5fx199QESBk&feature=youtu.be
My work on this project can be seen here:

http://forum.arduino.cc/index.php?topic=60117.0

This program converts Black and White pictures into bit-pattern(binary). This then converted into Hexadecimals and sent to Arduino using USB port . A special protocol for getting the data from VB software has been implemented in the Arduino sketch. If you read my following thread you will see how difficult is to write a protocol. This is because you are getting every type of extended ASCII character from serial i.e. you do not have a good choice of limiter and delimiter characters.

All these character are then processed into the arduino sketch and using this built in SHIFTOUT function send to the 80 solenoid valves. I can send the VB program and the Arduino Sketch for optimization if any one interested especially the High end programmer like you. I am successfully running this with thousands of bit shifting operations without losing any character due to very refined protocol ( i accidentally made), however sometime i feel there is increase in delay after long long run the waterfall pattern has little shifting due to the delay. Shifting mean change in ASPECT RATIO of the falling pattern due to logarthimic delay (this delay i think due to this built-in SHIFOUT function as described above)

if you want to shift out 80 you shift out 10 bytes and the amount of bit shift equals
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
0
1
2
3
... etc
You never have a bitshift of say 79 bits.
and using the trick in response #3 you have this pattern of shifts
0
1
1
1
1
1
1
1
0
1
1
1
1
1
1
0
1
1
1
... etc

look like the longest post with the least text