Optimize "Software-SPI-Slave-Mode" ISR

Hello all,

I have written an ISR to receive data from another Arduino. I am aware of the fact the atmega328 can perform this with hardware, but I am exploring the possibility to save the hardware pins to be used in master mode. Up till here the code works (on the receiving end) with SPI_CLOCK_DIV32 (on the sending end), if I switch to SPI_CLOCK_DIV16 I start to miss bits. I am open for any tips to optimize the code to see if I can get it to work with SPI_CLOCK_DIV16.

Cheers,

Jack

Receiving end:

volatile byte flag = 0;

void setup()
{
  Serial.begin(9600);
  attachInterrupt (0, _isrd2, FALLING);  // attach interrupt handler
  pinMode(2, INPUT); // SS
  pinMode(5, INPUT); // SCK
  pinMode(6, INPUT); // DI
}

void loop()
{
  if (flag)
  {
    Serial.println(flag,BIN);
    flag = 0;
  }
}

void _isrd2()
{
  byte rec = 0;
  byte bn = 1;
  byte pd = 1;
  byte p = 0;
  
  p = PIND;
    
  while (~p & B00000100)      // as long as Pin PD2 is LOW (SS LOW)
  {
    if ( p & B00100000 )      // if clock pin (PD5) is HIGH
    {
      if (pd)                 // if the high pulse on the clock has not been processed
      {
        pd = 0;               // flag as processed
        if (p & B01000000)    // if the Data In pin (PD6) is HIGH
        {
          rec += bn;          // add bit to received byte 
        }
        bn = bn << 1;         // shift to the next bit to receive
      }
    }
    else
    {
      pd = 1;                 // if clock is low again, we can process the next clock high pulse
    }
    p = PIND;                 // read value of D-register
  }
  
  flag = rec;                 // set the flag value to the received byte
}

Sending end:

#include <SPI.h>

/** nop to tune soft SPI timing */
#define nop asm volatile ("nop\n\t")

void setup()
{
  Serial.begin(9600);

  pinMode( 2, OUTPUT);   // still using this as SS
  digitalWrite(2, HIGH);

  pinMode(10, OUTPUT);
  digitalWrite(10, HIGH);
  SPI.begin();
  SPI.setClockDivider(SPI_CLOCK_DIV32);
}

void loop()
{
  byte snd = B10111101;

  PORTD = PORTD & B11111011; // pull PD2 LOW
  delay(1);                  // delay depends on spiClockDivider
  SPI.transfer(snd);
  PORTD = PORTD | B00000100;

  delay(500);  // repeat after 0.5 sec
}

for readability you should give B00000100 and others a meaningful name

The while (~p & B00000100) // as long as Pin PD2 is LOW (SS LOW) can keep on going creating a very long interrupt until the sender site sets SS HIGH.

if the system would get an interrupt for every bit you do not need that while loop at all (would be much better.

Try with the interrupt on the clock pin. With SPI_CLOCK_DIV16 you have one microsecond to get thru your loop and be ready to process the next bit, not sure if that is doable.

@Rob: the time SS is low would be the time the other party is sending data. Yes, they will have to set it HIGH after sending, otherwise it would ruin the system, granted. However, seeing that it takes about 5 us overhead to run an ISR (got this number from Nick Gammon's site) I wouldn't want to enter the interrupt for every bit, since the minimum I expect is 8 bits. I will try and use some more useful names :stuck_out_tongue:

@David: I don't know. Doable, yes, workable, maybe. I might give it a shot. The reason I did not choose the interrupt on the clock pin was because the clock pin will oscillate as well when other spi devices are accessed, so any ISR connected to SCK will have to start with asserting that SS is LOW. However, it does make sense.

Both thank you for your input.

@David: it does not work. It makes sense as well. From Nick Gammon's site (Gammon Forum : Electronics : Microprocessors : Interrupts):

However the external interrupts (where you use attachInterrupt) do a bit more, as follows:
...

I count 82 cycles there (5.125 µS in total at 16 MHz) as overhead plus whatever is actually done in the supplied interrupt routine. That is, 2.9375 µS before entering your interrupt handler, and another 2.1875 µS after it returns.

Even with clock divided by 128 the bits stream in to quickly. Too bad, because it would have made an elegant solution. I think that if you would use a very slow software SPI on the sending side this might work, but I'm looking for a bit of speed here.

I'm still wondering whether I can optimize the code a bit.

Cheers,

Jack

I'm not too familiar with AVR assembly but this looks like it could be improved...

        if (p & B01000000)    // if the Data In pin (PD6) is HIGH
        {
          rec += bn;          // add bit to received byte 
        }
        bn = bn << 1;         // shift to the next bit to receive

You could use rec += 1 and shift rec instead of bn. That way, you might pick up an immediate operand and save a cycle or two. Also try |= instead of +=.

See if you can get rid of that if statement:

rec += (p&B01000000) >> 6;

Not sure which is more expensive, shifting of branching, but it might be worth trying.

David,

I tried as such and (a) it does work (b) but alas - not faster, at least not to the level of clock divided by 16. Thanks for the suggestions. I'm afraid that if I want to implement sending data back through this system it will slow even more, I'll have to see how it works out then.

Cheers,

Jack

just a try to rewrite the ISR routine, it uses less statements
can you test it with SPI_CLOCK_DIV16

void _isrd2()
{
  byte rec = 0;
  byte bn = 1;
    
  while (~PIND & B00000100)           // while SS is LOW (PD2)
  {
    while ((PIND & B00100000) == 0) ; // wait for CLOCK HIGH

    if (PIND & B01000000)             // if Data HIGH  (PD6)
    {
      rec += bn;                      // add 1 bit  
    }
    bn = bn << 1;                     // shift to the next bit to receive

    while ((PIND & B00100000) == B00100000 ); // wait for CLOCK LOW
  }
  flag = rec;                         // set the flag value to the received byte
}

Rob,

it's a lovely rewrite. It does seem I get into an endless loop. After entering the ISR for the first time it doesn't get out, tested with div64. However, I can't see the error. It's hanging on the lines while ( (PIND & B00100000) == 0) ;, if I comment them the ISR runs. I just don't see why the ISR would stay in that while loop.

Jack

Once the clock stops, you are stuck in one of the while loops depending on clock polarity. The master stops the clock and then raises SS.

@David: right, that makes sense! Thanks !

CaptainJack:
Rob,

it's a lovely rewrite. It does seem I get into an endless loop. After entering the ISR for the first time it doesn't get out, tested with div64. However, I can't see the error. It's hanging on the lines while ( (PIND & B00100000) == 0) ;, if I comment them the ISR runs. I just don't see why the ISR would stay in that while loop.

Jack

the blocking endless loop can be fixed by extending the inner conditions. Please try

void _isrd2()
{
  flag = 0;
  byte bn = 1;
    
  while (~PIND & B00000100)           // while SS is LOW (PD2)
  {
    while ((PIND & B00100000) == 0 && ~PIND & B00000100) ; // wait for CLOCK HIGH
    if (PIND & B01000000)             // if Data HIGH  (PD6)
    {
      rec += bn;                      // add 1 bit  
    }
    bn = bn << 1;                     // shift to the next bit to receive
    while ((PIND & B00100000) == B00100000 && ~PIND & B00000100 ); // wait for CLOCK LOW
  }
  flag = rec;
}

I'll try it when I get back home and report !

Rob: defo works now @ clock div 32, and... nearly there @ div 16, say about 1 in 10 transfers has a bit wrong, so in 90% of the cases it works at that speed. I have no idea whether it's possible to trim of more, but I say this gets really close to the mark.

Cheers !

Jack

Possible options is to simplify the while loops

  • while ((PIND & B00100000) == 0 && ~PIND & B00000100) -
    as PIND is read twice. (once inverted:)

OK lets give it another try

void _isrd2()
{
  byte bn = 1;
  byte rec = 0;
 
  byte reg = PIND;
    
  while (~reg & B00000100) 
  {
    while ((reg & B00100000) == 0 && ~reg & B00000100)  { reg = PIND; };

    if (reg & B01000000)  
    {
      rec += bn;   
    }
    bn = bn << 1;
   
    while ((reg & B00100000) == B00100000 && ~reg & B00000100 ) { reg = PIND; };
  }
  flag = rec;
}

tried to minimize the boolean expressions - (

void _isrd2()
{
  byte bn = 1;
  byte rec = 0;
 
 byte reg = PIND;
    
  while (~reg & B00000100) 
  {
    while ((~reg & B00100100)  == B00100100)  { reg = PIND; };

    if (reg & B01000000)  
    {
      rec += bn;   
    }
    bn = bn << 1;
   
    while ((~reg & B00100100) == B00000100) { reg = PIND; };
  }
  flag = rec;
}

or even

void _isrd2()
{
  byte bn = 1;
  byte rec = 0;
 
 byte reg = PIND;
    
  while (~PIND & B00000100) 
  {
    while ((~PIND & B00100100) == B00100100) ;

    if (PIND & B01000000)  
    {
      rec += bn;   
    }
    bn = bn << 1;
   
    while ((~PIND & B00100100) == B00000100);
  }
  flag = rec;
}