question on work around for bitClear(PORTB, cs_pin);

This is vaguely related to my previous thread, but deals with another matter,
http://forum.arduino.cc/index.php?topic=181325.0

I have this library code [3rd party, not mine], which I'm trying to generalize. It currently hardcodes the CS pin value via conditional compilation, and does not allow the Port-value to be changed by the program, since it uses bitClear() and bitSet() for efficiency.

original header file:
----------------------
  #define SS_PORT     PORTB
  #define SS_BIT        2     // for PORTB: 2 = d.10, 1 = d.9, 0 = d.8

original .ccp file:
---------------------
uint8_t RFM12B::cs_pin = SS_BIT;     // CS pin value - modifiable.

// here is the critical function:
void RFM12B::XFER(uint16_t cmd) 
{
  bitClear(SS_PORT, cs_pin);
//  digitalWrite(10, LOW);
//  enable_cs();
  Byte(cmd >> 8);         // Byte() sends a databyte out the SPI port.
  Byte(cmd & 0xFF);
  bitSet(SS_PORT, cs_pin);
//  digitalWrite(10, HIGH);
//  disable_cs();
}

In the above, digitalWrite() will work, but is very slow. So, I want to define new functions as follows, which will allow changing the CS pin to any Port.bit, but also be very efficient. I came up with this, after trying several things. Anyone got a slam-dunk better way?

volatile uint8_t *ptr = &PORTB;

void RFM12B::enable_cs() 
{
   bitClear(*ptr, cs_pin);
}

void RFM12B::disable_cs() 
{
   bitSet(*ptr, cs_pin);
}

void RFM12B::set_CSpin( volatile uint8_t* porto, uint8_t pin ) 
{
   ptr = porto;
   cs_pin = pin;
}

call:
----
  // eg, change CS to PORTD.5
  radio.setCSpin( &PORTD, 5);

This also works, I guess,

#define enable_cs()    bitClear(*ptr, cs_pin)
#define disable_cs()   bitSet(*ptr, cs_pin)

Let's do some tests. First with digitalWrite:

#include <SPI.h>

void xfer (const unsigned int cmd, const byte cs_pin)  
  {
  PINB = 0b1;  // toggle D8
  digitalWrite (cs_pin, LOW);
  PINB = 0b1;  // toggle D8

  SPI.transfer (cmd >> 8);
  SPI.transfer (cmd & 0xFF);

  PINB = 0b10;  // toggle D9
  digitalWrite (cs_pin, HIGH);
  PINB = 0b10;  // toggle D9
  }
  
void setup( void )
  {
  pinMode(8, OUTPUT);    
  pinMode(9, OUTPUT);    
  SPI.begin (); 
  }

void loop( void )
  {
  xfer (0xABCD, 10);
  delay (100);  
  }

I used the "toggle pin" trick to try to get fast timings on D8 and D9. Results:

So, over 5 µS to enable or disable chip select.


Let's inline it with bit manipulation:

#include <SPI.h>
  
void xfer (unsigned int cmd )  
  {
  PINB = 0b1;  // toggle D8
  PORTB &= ~0b100;   // D10 low
  PINB = 0b1;  // toggle D8

  SPI.transfer (cmd >> 8);
  SPI.transfer (cmd & 0xFF);

  PINB = 0b10;  // toggle D9
  PORTB |= 0b100;    // D10 high
  PINB = 0b10;  // toggle D9
  }
  
void setup( void )
  {
  pinMode(8, OUTPUT);    
  pinMode(9, OUTPUT);    
  SPI.begin (); 
  }

void loop( void )
{
  xfer (0xABCD);
  delay (100);  
}

Results:

That's 250 nS (4 clock cycles) which is probably about as good as you are going to get. But, it isn't flexible.


How about supplying a function?

#include <SPI.h>

void myEnable_cs ()
  {
  PORTB &= ~0b100;   // D10 low
  }

void myDisable_cs ()
  {
  PORTB |= 0b100;    // D10 high
  }
  
void xfer (unsigned int cmd, void (*enable_cs) (), void (*disable_cs) () )  
  {
  PINB = 0b1;  // toggle D8
  enable_cs ();
  PINB = 0b1;  // toggle D8

  SPI.transfer (cmd >> 8);
  SPI.transfer (cmd & 0xFF);

  PINB = 0b10;  // toggle D9
  disable_cs ();
  PINB = 0b10;  // toggle D9
  }
  
void setup( void )
  {
  pinMode(8, OUTPUT);    
  pinMode(9, OUTPUT);    
  SPI.begin (); 
  }

void loop( void )
{
  xfer (0xABCD, myEnable_cs, myDisable_cs);
  delay (100);  
}

Now that's flexible because the function can be written to do whatever you want. You could even have an array of functions (one per pin) and select the one you want in setup.

Results:

687.5 nS - 11 clock cycles. So that's 7 more clock cycles than the hard-coded one, but still a lot faster than 5 µS.

Hi Nick, thanks for doing the timing measurements - good to know how much time these operations take. I'll take a good long look at how you did things. I figured digitalWrite() would be very slow. BTW, what do you use to make those measurements?

I notice you didn't time the bitSet(), bitClear() functions like I wrote

bitClear(*ptr, cs_pin)

BTW, do you know where these functions are defined? I looked through a bunch of files in the the IDE directory but couldn't locate them.

oric_dan:
BTW, do you know where these functions are defined? I looked through a bunch of files in the the IDE directory but couldn't locate them.

Arduino.h

oric_dan:
I notice you didn't time the bitSet(), bitClear() functions like I wrote

bitClear(*ptr, cs_pin)
#include <SPI.h>

volatile byte *ptr = &PORTB;
byte cs_pin = 2;

void enable_cs() 
{
   bitClear(*ptr, cs_pin);
}

void disable_cs() 
{
   bitSet(*ptr, cs_pin);
}


void xfer (const unsigned int cmd)  
  {
  PINB = 0b1;  // toggle D8
  enable_cs ();
  PINB = 0b1;  // toggle D8

  SPI.transfer (cmd >> 8);
  SPI.transfer (cmd & 0xFF);

  PINB = 0b10;  // toggle D9
  disable_cs ();
  PINB = 0b10;  // toggle D9
  }
  
void setup( void )
  {
  pinMode(8, OUTPUT);    
  pinMode(9, OUTPUT);    
  SPI.begin (); 
  }

void loop( void )
  {
  xfer (0xABCD);
  delay (100);  
  }

Results:

Faster than digitalWrite but not excitingly so.

Darn, I already looked in Arduino.h but I guess my head was spinning so much back then, I didn't see them.

#define bitSet(value, bit) ((value) |= (1UL << (bit)))
#define bitClear(value, bit) ((value) &= ~(1UL << (bit)))

They are basically the same functions you already measured in your test #3, and got 687 usec. Why are they so much slower here? Oh well, I guess reading a variable (cs_pin), shifting multiple positions, and dereferencing a pointer (*ptr), plus the subroutine call, all take a while.

Thanks, I'll check the Saleae info.

Hey Nick. I wrote a program to time the various functions on a 16Mhz UNO, using various forms of the bit set/clear functions, and millis() to time. Data is summarized, the last line shows the original display format. Typically 100,000 executions. for()-loop times are not subtracted out.

digitalWrite() is very slow, but all the other forms are very fast, including using a subroutine call. The last test took the bitSet(),bitClear() code and embedded it directly. I don't know why the last result you showed was so slow, mine are all routinely quick.

digitalWrite - time/loop(usec): 5.03

bitSet() - time/loop(usec): 0.75

bitSet(),bitClear() - time/loop(usec): 0.88

bitSet() via subroutine call - time/loop(usec): 0.75

bitSet(*ptr, pin) - time/loop(usec): 0.75

both (value |= 1UL << bit) and (value &= ~(1UL << bit)) - time/loop(usec): 0.88

---- display format -------
digitalWrite - #loops: 100000, total time(msec): 503, time/loop(usec): 5.03

Typical code:

/***************************************/
void test6()
{
  unsigned long i;
  char* testmsg = "bitSet(*ptr, pin) - ";
  
  uint8_t pin = 5;
  uint8_t* ptr = (uint8_t*)&PORTB;
  
  LoopCnt = 1000000UL;
  Begin = millis();
  for( i=0; i<LoopCnt; i++) {
    bitSet(*ptr, pin); 
  }
  End = millis();
  bitClear(PORTB,5);
  display( testmsg );
}
  uint8_t* ptr = (uint8_t*)&PORTB;

You have cast away the "volatile" attribute which means the compiler can optimize, thus it isn't a true test. This, for example, is slower:

unsigned long LoopCnt, Begin, End;

volatile uint8_t* ptr = &PORTB;

/***************************************/
void test6()
{
  unsigned long i;
  char* testmsg = "bitSet(*ptr, pin) - ";
  
  uint8_t pin = 5;
  
  LoopCnt = 1000000UL;
  Begin = millis();
  for( i=0; i<LoopCnt; i++) {
    bitSet(*ptr, pin); 
  }
  End = millis();
  bitClear(PORTB,5);
  Serial.print ( testmsg );
  Serial.println (End - Begin);
}

void setup ()
  {
  Serial.begin (115200);
  test6 ();
  }  // end of setup

void loop () { }

I got:

??bitSet(*ptr, pin) - 1696

That is compared to 753 nS (your figure) if you don't have it volatile. That is still shorter than my measured time above, but my method would have included a bit of time taken to toggle the (timing) pin, whereas your method averages that out over many iterations.

Your other tests may need inspecting to see if they have the same issue.

Good to know, I'll go back and try the volatile bit.

I was a little worried about compiler optimization. Is there a way to shut it off so the tests don't get skewed? I did try to measure for-loop time, but even with trivial statements, like testcnt++ and testcnt=i, inside the loop, I couldn't get a measurement.

I just usually use volatile when I want to force the compiler to do something. For example, toggling a port will work because the port variables are volatile.

Thanks, I'll test some when I get home.

Nick, you're 100% correct. I took ptr out of the function, and tried it as both volatile uint8_t* and regular uint8_t*, and it slowed way down. This is mind-boggling how it could possibly be so slow. What on earth is going on that burns so many processor cycles? Something seems very wrong here. ???

Test 6
bitSet(*ptr, pin) - #loops: 100000, total time(msec): 440, time/loop(usec): 4.4

uint8_t pin = 5;
volatile uint8_t* ptr = &PORTB;
 
/***************************************/
void test6()
{
  unsigned long i, vara;
  char* testmsg = "bitSet(*ptr, pin) - ";
    
  LoopCnt = 100000UL;
  Begin = millis();
  for( i=0; i<LoopCnt; i++) {
    bitSet(*ptr, pin); 
  }
  bitClear(PORTB,5);
  End = millis();
  display( testmsg );
}

The following is also very slow,

Test 6A
(*ptr |= 1<<pin) - #loops: 100000, total time(msec): 371, time/loop(usec): 3.71

uint8_t pin = 5;
volatile uint8_t* ptr = &PORTB;
 
/***************************************/
void test6()
{
  unsigned long i, vara;
  char* testmsg = "(*ptr |= 1<<pin) - ";
//  char* testmsg = "bitSet(*ptr, pin) - ";
    
  LoopCnt = 100000UL;
  Begin = millis();
  for( i=0; i<LoopCnt; i++) {
    *ptr |= 1<<pin;
//    bitSet(*ptr, pin); 
  }
  bitClear(PORTB,5);
  End = millis();
  display( testmsg );
}