Go Down

Topic: Arduino as SPI Slave - ATMEL Data Sheet clarification (Read 232 times) previous topic - next topic

kevin_radtke

I have not been able to find an answer to this question, perhaps someone could help me understand the hardware SPI function as described in the 328 datasheet.

Background

I am trying to use an Arduino UNO R3 as an SPI slave, no response is sent to the master. I do not have a logic analyzer available, however using my oscilloscope incoming data is formatted like this:

SS idles high, goes low for each frame
SCK: idles high, data on leading edge falling, clock pulses are about 300ns wide and 4us apart or 250kHz

A packet of 7 data frames is sent every 750 milliseconds.

The first 3 data frames contain 12 bits each, the remaining 4 contain 17 bits each.

Questions

It is my understanding that the SPIF interrupt flag is set each time a complete byte is received. If I use the SPI interrupt to store bytes received or poll the SPSR register, how do I save the bits received that do not make a complete byte and thus do not set SPIF?

According to the datasheet:
When configured as a Slave, the SPI interface will remain sleeping with MISO tri-stated as long as the SS pin is
driven high. In this state, software may update the contents of the SPI Data Register, SPDR, but the data will not
be shifted out by incoming clock pulses on the SCK pin until the SS pin is driven low. As one byte has been
completely shifted, the end of Transmission Flag, SPIF is set. If the SPI Interrupt Enable bit, SPIE, in the SPCR
Register is set, an interrupt is requested. The Slave may continue to place new data to be sent into SPDR
before reading the incoming data. The last incoming byte will be kept in the Buffer Register for later use.


and also

19.3 SS Pin Functionality
19.3.1 Slave Mode
When the SPI is configured as a Slave, the Slave Select (SS) pin is always input. When SS is held low, the SPI
is activated, and MISO becomes an output if configured so by the user. All other pins are inputs. When SS is
driven high, all pins are inputs, and the SPI is passive, which means that it will not receive incoming data. Note
that the SPI logic will be reset once the SS pin is driven high.
The SS pin is useful for packet/byte synchronization to keep the slave bit counter synchronous with the master
clock generator. When the SS pin is driven high, the SPI slave will immediately reset the send and receive logic,
and drop any partially received data in the Shift Register.


So for example when SS goes low and I begin receiving data for the first frame, I assume that SPIF will be set after the first 8 bits are received and then the remaining 4 bits will be received but SS goes high before SPIF will be set. How do I save those last 4 bits as it seems from the data sheet that the any partially received data in the Shift Register is dropped immediately.

I have carefully studied the example code provided at http://www.gammon.com.au/spi, and many other Google searches but haven't found an answer.

Thanks for any help!

pylon

Quote
The first 3 data frames contain 12 bits each, the remaining 4 contain 17 bits each.
That's a 104 bits or 13 bytes.

Quote
If I use the SPI interrupt to store bytes received or poll the SPSR register, how do I save the bits received that do not make a complete byte and thus do not set SPIF?
You cannot.

The ATmega328 isn't able to act as a slave with message sizes not a multiple of 8 bits. Such SPI frame formats are very uncommon. I heard of 12bit frames but I've never heard of 17bit frames before.

Quote
SCK: idles high, data on leading edge falling, clock pulses are about 300ns wide and 4us apart or 250kHz
At that speed you can easily bit bang the interface. Just use a hardware interrupt connected to the SCLK signal to read a bit at every edge. You might use the other hardware interrupt or a pin change interrupt to activate this interface depending on the state of the SS pin. Just be sure to keep the ISR very short so you don't get timing problems.

kevin_radtke

#2
Nov 08, 2018, 08:41 pm Last Edit: Nov 08, 2018, 09:57 pm by kevin_radtke
The ATmega328 isn't able to act as a slave with message sizes not a multiple of 8 bits. Such SPI frame formats are very uncommon. I heard of 12bit frames but I've never heard of 17bit frames before.

OK, thanks pylon, that's exactly what I wanted to know.

It's a proprietary protocol. I don't care about the first 3 header frames, but the last four frames contain an 8-bit address followed by 9-bits of data I need to process.

The reason I was looking into the hardware SPI interface was speed. I've been using interrupts but the interrupts are not responding fast enough and I am losing bits. Even a tight while loop polling the SCK line was too slow.

I was using port manipulation to very quickly set a pin high and then low to observe timing in my interrupts and I can see they are lagging behind the data.

I did however just stumble across two potential problems with my code that I will be investigating.

First, I was using a volatile uint32_t type variable rx_buffer to store the (up-to) 17 bits received on digital pin 4 using the following:

Code: [Select]

rx_buffer = (rx_buffer << 1) + bitRead(PIND, 4);


I believe, the 4 byte uint32_t requires too many instruction cycles to manipulate and is too slow. This is further compounded by the volatile declaration which requires RAM access.

I am planning to replace the uint32_t with an array which will allow me to store up to 24 bits received.

Code: [Select]

uint8_t rx_buffer[3]


I just need to keep track of the number of bits received and index the appropriate rx_buffer location. I'll update this post to report if successful.

pylon

Quote
The reason I was looking into the hardware SPI interface was speed. I've been using interrupts but the interrupts are not responding fast enough and I am losing bits. Even a tight while loop polling the SCK line was too slow.
At that speed the interrupts should be fast enough. Post that code and we try to enhance it's efficiency.

Quote
I believe, the 4 byte uint32_t requires too many instruction cycles to manipulate and is too slow. This is further compounded by the volatile declaration which requires RAM access.
The shift operations are quite fast and the volatile declaration only affects the optimizer (to not optimize your complete ISR away).

Quote
I am planning to replace the uint32_t with an array which will allow me to store up to 24 bits received.
That's probably not faster.


kevin_radtke

OK, here is the latest version of the code. I have replaced the uint32_t version of rx_buffer with a byte array and the interrupts process MUCH faster despite the additional index management required, however this is still not fast enough and I'm missing clock edges and therefore bits.

The INT0 interrupt does not exit quick enough to catch the first clock pulse so I always miss the first bit. I have also tried disabling INT0 and using a while((PIND & (1<<LCDCS))==LOW) loop to fast poll the SCK line and read the data pin. This is much faster but I still can't catch all the bits.

I am looking at making a very simple test program which will just trigger interrupts and set the debug LED pin HIGH/LOW without doing anything else. If the Arduino ISR's can't keep up then either the Arduino just isn't fast enough or there's something else going on like a noisy signal.

Code: [Select]



const byte LCDCS = PD2; // PORTD pin 2 - Slave Select
const byte LCDCL = PD3; // PORTD pin 3 - SCK
const byte LCDDI = PD4; // PORTD pin 4 - MOSI data
const byte LED = PB5; // PORTB digital pin 13
const byte NUM_FRAMES = 7; // Number of frames expected per packet
const byte NUM_BYTES = 3; // rx_buffer bytes


volatile uint8_t frame_count = 0; // frame received counter
volatile uint8_t rx_count = 0; // bits received counter
volatile uint8_t rx_byte = 0; // rx_buffer index
volatile uint8_t rx_buffer[NUM_FRAMES][NUM_BYTES]; // Buffer to hold maximum of 3 bytes received per frame
volatile bool process_data = false; // data ready flag

void setup() {
  Serial.begin(115200);
  Serial.println("Initializing, waiting to synchronize serial data RX..."); // debug

  pinMode(LCDCS_PIN, INPUT);
  pinMode(LCDCL_PIN, INPUT);
  pinMode(LCDDI_PIN, INPUT);
  pinMode(LED_PIN, OUTPUT);
 
  PORTB |= (1<<LED); // LED on 

  // Disable and configure external interrupts as INT0:CHANGE & INT1:FALLING
  EIMSK &= ~((1<<INT1)|(1<<INT0)); // disable interrupts
  EICRA = ((1<<ISC11)|(0<<ISC10)|(0<<ISC01)|(1<<ISC00)); // ISCxx: 01=logical change, 10=falling edge, 11=rising edge
 
  // initialize rx_buffer
  for(int i = 0; i < NUM_FRAMES; i++){
    for(int j = 0; j < NUM_BYTES; j++){
      rx_buffer[i][j] = 0;
    }
  }
 
  while((PIND & (1<<LCDCS)) == LOW); // wait until ANY frame begins then delay 200ms to synchronize with first frame. Packet transmission time <1ms every 750ms.
  delay(200); // delay 200ms to make sure current packet transmission is complete and we catch frame 1 of next packet
  TIMSK0 &= ~(1<<TOIE0); // Disable timer0 overflow interrupt which may interfere with GPIO interrupts
 
  PORTB &= ~(1<<LED); // LED off

  Serial.print("EICRA="); // debug
  Serial.println(EICRA, BIN); //debug
  Serial.println("Initialization Complete..."); // debug

  EIMSK |= ((1<<INT0)|(1<<INT1)); //enable INT0 & INT1

}

void loop() {

uint32_t frames[NUM_FRAMES];

  if(process_data){
    //Data frames received and ready for processing
   
    //re-assemble bytes received in rx_buffer
    for(int i = 0; i < NUM_FRAMES; i++){
      frames[i] = ((uint32_t)(rx_buffer[i][2]) << 16);
      frames[i] |= ((uint32_t)(rx_buffer[i][1]) << 8);
      frames[i] |= ((uint32_t) (rx_buffer[i][0]));
    }//end for i
   
    //Debug
    for(int i = 0; i < NUM_FRAMES; i++){
      Serial.print("Frame(");
      Serial.print(i, DEC);
      Serial.print(") = ");
      Serial.println(frames[i],BIN);
    }//end for i
    Serial.println();
   
    process_data = false;
    for(int i = 0; i < NUM_FRAMES; i++){
      for(int j = 0; j < NUM_BYTES; j++){
        rx_buffer[i][j] = 0;
      }//end for j
    }//end for i
    frame_count = 0;
  }//end if(process_data)

}

// INT0 ISR triggers on logical change of LCDCL
ISR(INT0_vect) {
  //DEBUG TIMING CODE - PIN TOGGLING: USE OSCILOSCOPE TO DETERMINE TIMING
  //PORTB |= bit(LED); // set pin D13 HIGH, overhead is 2 cycles or 125ns @ 16mHz
 
  //Approx 2.0 - 2.5us elsapse from the time LCDCS goes LOW to the first LCDCL pulse (INT1) or 32-40 clock cycles
 
  if((PIND & (1<<LCDCS)) == LOW){
    //LCDCS LOW signalling start of frame
    rx_byte = 0; // reset the rx_buffer index
    rx_count = 0; // reset the bit counter
  }
  else {
    // LCDCS HIGH signalling end of frame
    frame_count++; // increment the frame counter
   
    if(frame_count == NUM_FRAMES){
      process_data = true;
    }//end if(frame_count == NUM_FRAMES)
  }//end if((PIND & (1<<LCDCS)) == LOW)
 
  //PORTB &= ~bit(LED); // set pin D13 LOW, overhead is 2 cycles or 125ns @ 16mHz
}// end ISR(INT0_vect)

// INT1 ISR triggers on falling edge of LCDCL which indicates data should be read
ISR(INT1_vect) {
  //DEBUG TIMING CODE - PIN TOGGLING: USE OSCILLOSCOPE TO DETERMINE TIMING
  PORTB |= bit(LED); // set pin D13 HIGH, overhead is 2 cycles or 125ns @ 16mHz
 
  // Approx 3-4us between clock pulses, therefore 48-64 clock cycles between interrupts on INT1
 
  rx_buffer[frame_count][rx_byte] = (rx_buffer[frame_count][rx_byte] << 1) + bitRead(PIND, LCDDI); // shift in the new bit
  rx_count++;
  if(rx_count == 8){
    rx_byte++; // increment the byte index
    rx_count = 0; // reset the bit counter
  }//end if(rx_count == 8)
 
  PORTB &= ~bit(LED); // set pin D13 LOW, overhead is 2 cycles or 125ns @ 16mHz
}


kevin_radtke

I don't think the interrupts are fast enough to deal with this incoming serial data.

I did a totally stripped down test, there is nothing in this sketch except these two interrupt service routines setup as follows:

INT0 (LCDCS) triggers on logical change
INT1 (LCDCL) triggers on falling edge

Here's the code I'm using. Just toggling the LED pin HIGH/LOW to time the interrupts.

Code: [Select]

// INT0 ISR triggers on LCDCS which is the Slave Select line
ISR(INT0_vect) {
  //DEBUG TIMING CODE - PIN TOGGLING: USE OSCILOSCOPE TO DETERMINE TIMING
  PORTB |= bit(LED); // set pin D13 HIGH, overhead is 2 cycles or 125ns @ 16mHz
  PORTB &= ~bit(LED); // set pin D13 LOW, overhead is 2 cycles or 125ns @ 16mHz
}// end ISR(INT0_vect)

// INT1 ISR triggers on falling edge of LCDCL which indicates data should be read
ISR(INT1_vect) {
  //DEBUG TIMING CODE - PIN TOGGLING: USE OSCILLOSCOPE TO DETERMINE TIMING
  PORTB |= bit(LED); // set pin D13 HIGH, overhead is 2 cycles or 125ns @ 16mHz
  PORTB &= ~bit(LED); // set pin D13 LOW, overhead is 2 cycles or 125ns @ 16mHz
  }// end ISR(INT1_vect)


Attached is a screenshot from my scope. It seems obvious that the LCDCL interrupts are lagging behind and would result in incorrect readings of the data.


GolamMostafa

#6
Nov 10, 2018, 04:58 pm Last Edit: Nov 10, 2018, 04:59 pm by GolamMostafa
@OP

Can you please, post a typical example of your data which is composed of:
'The first 3 data frames contain 12 bits each, the remaining 4 contain 17 bits each'.

There are in total: 3x12 + 4x17 = 36 + 68 = 104 (8x13) bits data = 13 bytes.

As SPI is a byte oriented interface, the 104 bits data could be organized into byte oriented data. Send the data using SPI interface; at the receiver side, the original data frames (3x12 and 4x17) can be reconstructed from the received bytes.


kevin_radtke

#7
Nov 10, 2018, 06:47 pm Last Edit: Nov 11, 2018, 12:24 am by kevin_radtke
I'm not home till later tonight so I only have a picture of the data being cleaned up by a Schmidt trigger which inverts all the signals but you can see the format of the incoming data. I'll post a picture of the actual data later.

I agree that the 104 bits can be broken into 13 x 8 bit chunks but I have no control over the sent data format and more importantly the LCDCS or slave selct line which will flush the partially received bits in the incoming SPI register when it goes HIGH.

kevin_radtke

As requested, here is the 7 frames of data as received. I only care about frames 4 thru 7.

pylon

If I read the scope output correctly, the clock signals spikes actually aren't 4µs apart but about 2.5-3µs. If the 4µs are best case an external interrupt may be too slow. I would try to use a faster platform (Arduino Due, Teensy 3.x, etc.).

The typical overhead of entering an ISR on an Arduino is slightly more than 1µs (about 18 clock cycles). It might be necessary to have a look at the assembler code generated by that sketch source to see where the rest of the clock cycles went.

kevin_radtke

Thanks Pylon, I think you're right, the clock pulses are a little inconsistent and many of them are indeed shorter than 4us apart. I'll have to do this another way without using interrupts.

pylon

Quote
I'll have to do this another way without using interrupts.
I'm quite sure you won't get happy with your UNO in this case. Use a faster model and the interrupts will work without a problem.

kevin_radtke

I'm quite sure you won't get happy with your UNO in this case. Use a faster model and the interrupts will work without a problem.
Thanks for your help and advice pylon.

I actually did get it working. I'll summarize for anyone reading this so there's some conclusion to this topic. Using a faster MCU would allow for a more elegant solution and easier to understand code. That is good advice.

1) As to my original question: No, the built in SPI hardware cannot be used when the incoming data is not formatted in complete bytes. In this case frames of 12 and 17 bits were being sent and the SPI interrupt is not triggered until a complete byte is received. When the slave select line is un-asserted, the partially received byte is discarded immediately according to the datasheet.

2) The solution that worked for me was to continually poll the Slave Select pin in my main loop() and allow the SCK line to trigger an external interrupt. The ISR had to be kept to two lines in order to exit fast enough to catch the next clock edge. All data processing was done during the >600ms between data packets.

Here's the relevant code:

Code: [Select]

void loop() {

  if(!(PIND & (1<<LCDCS))){ // Check if slave select (LCDCS) is LOW signaling data transmission
    
     while(!(PIND & (1<<LCDCS))){ // loop while LCDCS is LOW and receive data on SCK (LCDCL) interrupts
    }

    frame_count++; // slave select (LCDCS) is now HIGH signalling end of frame. Increment frame counter
  }
  else{
    if(frame_count == NUM_FRAMES){ // Complete data packet received, time to process data
      
      process_data(); // do stuff here

      frame_count = 0; // reset the frame counter
      
      for(rx_pos = 0; rx_pos < NUM_BITS; rx_pos++){ // clear the rx_buffer
        rx_buffer[rx_pos] = 0;
      }
      rx_pos = 0; // reset the rx_buffer index
    }
  }
}


and the ISR to read the data line on clock edges

Code: [Select]

// External interrupt INT0 service routine. Triggers on falling edge of SCK (LCDCL)
ISR(INT1_vect){

  rx_buffer[rx_pos] = PIND; // Capture data on LCDDI by storing PORTD register for later processing
  rx_pos++;

}


As was pointed out, the SCK clock pulses are between 3-4us which works out to 333 to 250 kHz. I believe this is pushing the limits of the Arduino UNO @ 16MHz to capture incoming serial data but it does work. I had to store the entire PORTD register as writing any code to read and store just the data was too slow.

Thanks for the help!


Go Up