Arduino as SPI slave?

Look here

http://www.arduino.cc/playground/Code/RGBBackpack

Go down to this link
http://www.sparkfun.com/datasheets/Components/RGB_Backpack_v4.zip

Look at .c file forthe code that sets up to receive data via SPI. Have to take pieces from here & there, but you can connect all the peices to receive the data & put it into an array for printing after (vs putting in EEPROM)

torgil:
I'm trying to use an Arduino to eavesdrop SD card writes from a heater system controller (AVR based). I've soldered wires on the SD card holder and converted the signal from 3v3 to 5v0.

I've tried to change clock phase, clock polarity and bit order without any progress. Arduino source code:

...

void setup (void)
{
  Serial.begin (9600);   // debugging

// setup pins
  pinMode(MISO, OUTPUT);
  pinMode(MOSI, INPUT);
  pinMode(SS, INPUT);
  pinMode(SCK, INPUT);
 
  // turn on SPI in slave mode
  SPCR |= _BV(SPE);

...
 
}  // end of setup

This looks like my code for an SPI slave. That's nice that someone is reading it!

I wouldn't do this:

  pinMode(MISO, OUTPUT);

You are eavesdropping, right? So no need to configure output pins.

I would stick with the ISR - SPI is pretty fast, and not using an ISR is not going to help you very much.

When testing the SPI master/slave setup I had to drop down the clock rate that the master sent at. Otherwise the slave can't keep up. With the master clocking out at maximum rate, there aren't many spare clock cycles for the slave to do stuff in, like storing what it got.

From your scope output the clock polarity is clearly normal. It is usually low, and goes high to pulse (like on my web page). What pins are you hooking up to - ie. what is your wiring?

Can you zoom in on the first character from the AVR? It is hard to see what it is. And report what the timebase is, that is, the time difference between clock pulses.

Now that I look at the graphic again, it looks like it is 1 uS per division. So a byte is clocked out in 2 uS, is that right? That's too fast to monitor, I had to slow down my master somewhat when testing that code. And it isn't the code's fault. :slight_smile:

Unless you can slow down the AVR (perhaps you can) you might need faster hardware. For example, an FPGA board might be able to capture it fast enough, in bursts.

"you might need faster hardware. For example, an FPGA board" No no no!
You've got faster hardware in the '328 doing the monitoring - just need to figure out how to use it.

Nick, go take a look here. This unit is described as being a '328 that uses SPI to receive data from a master to display on a screen. Should be able to clean this up to accept a variable number of bytes & load into an array for spitting out via the serial port later. Ditch the EEPROM stuff, ditch the screen display stuff.

I am very interested in this for they dual ATMega I've got in the works. I'm thinking should be similar to code up as reading from Serial port when data comes in.

http://www.arduino.cc/playground/Code/RGBBackpack
Go down to this link
http://www.sparkfun.com/datasheets/Components/RGB_Backpack_v4.zip

See Also section 18 of the data sheet:
18. SPI – Serial Peripheral Interface
18.1 Features
• Full-duplex, Three-wire Synchronous Data Transfer
• Master or Slave Operation
• LSB First or MSB First Data Transfer
• Seven Programmable Bit Rates
• End of Transmission Interrupt Flag
• Write Collision Flag Protection
• Wake-up from Idle Mode
• Double Speed (CK/2) Master SPI Mode

The Serial Peripheral Interface (SPI) allows high-speed synchronous data transfer between the
ATmega48PA/88PA/168PA/328P and peripheral devices or between several AVR devices.

The interconnection between Master and Slave CPUs with SPI is shown in Figure 18-2 on page
167. The system consists of two shift Registers, and a Master clock generator. The SPI Master
initiates the communication cycle when pulling low the Slave Select SS pin of the desired Slave.
Master and Slave prepare the data to be sent in their respective shift Registers, and the Master
generates the required clock pulses on the SCK line to interchange data. Data is always shifted
from Master to Slave on the Master Out – Slave In, MOSI, line, and from Slave to Master on the
Master In – Slave Out, MISO, line. After each data packet, the Master will synchronize the Slave
by pulling high the Slave Select, SS, line.
When configured as a Master, the SPI interface has no automatic control of the SS line. This
must be handled by user software before communication can start. When this is done, writing a
byte to the SPI Data Register starts the SPI clock generator, and the hardware shifts the eight
bits into the Slave. After shifting one byte, the SPI clock generator stops, setting the end of
Transmission Flag (SPIF). If the SPI Interrupt Enable bit (SPIE) in the SPCR Register is set, an
interrupt is requested. The Master may continue to shift the next byte by writing it into SPDR, or
signal the end of packet by pulling high the Slave Select, SS line. The last incoming byte will be
kept in the Buffer Register for later use.
When configured as a Slave, the SPI interface will remain sleeping with MISO tri-stated as long
as the SS pin is driven high. In this state, software may update the contents of the SPI Data
Register, SPDR, but the data will not be shifted out by incoming clock pulses on the SCK pin
until the SS pin is driven low. As one byte has been completely shifted, the end of Transmission
Flag, SPIF is set. If the SPI Interrupt Enable bit, SPIE, in the SPCR Register is set, an interrupt
is requested. The Slave may continue to place new data to be sent into SPDR before reading
the incoming data. The last incoming byte will be kept in the Buffer Register for later use.

The system is single buffered in the transmit direction and double buffered in the receive direction.
This means that bytes to be transmitted cannot be written to the SPI Data Register before
the entire shift cycle is completed. When receiving data, however, a received character must be
read from the SPI Data Register before the next character has been completely shifted in. Otherwise,
the first byte is lost.

In SPI Slave mode, the control logic will sample the incoming signal of the SCK pin. To ensure
correct sampling of the clock signal, the minimum low and high periods should be:
Low periods: Longer than 2 CPU clock cycles.
High periods: Longer than 2 CPU clock cycles.

When the SPI is enabled, the data direction of the MOSI, MISO, SCK, and SS pins is overridden
according to Table 18-1 on page 168.

and it goes on a few pages more.

CrossRoads:
No no no!
You've got faster hardware in the '328 doing the monitoring - just need to figure out how to use it.

I'll look at the links in a minute, but I note from the datasheet pages you quoted:

   ; Enable SPI, Master, set clock rate fck/16 
   ldi r17,(1<<SPE)|(1<<MSTR)|(1<<SPR0)
   out SPCR,r17

It's interesting, isn't it, that in their own example code they slow down the clock to fck/16?

In my code example I had this:

 // Slow down the master a bit
  SPI.setClockDivider(SPI_CLOCK_DIV8);

I got away with slowing the master down to 1/8 clock, they showed 1/16 in the example code.

I will say upfront I only glanced thru to see what kind of stuff was there.
Programming at the lower level is not my current forte - but I bet if the EEPROM & display stuff is taken out, that code could be run faster.
I'm thinking something like this, only I haven't dug in to see what is needed to recognize the data byte is finished being clocked in:
array_name [size_limit]
end_array = size_limit+1;
array_pointer = 0; // start at beginning of array
while (array_pointer <end_array){
if (register has data){
array_name[array_pointer] = register_contents;
array_pointer = array_pointer+1;
}
}

And somehow get checking the SS line around that to know when to start this.
What's it take to make this work?

Doing the maths:

The clock on the Uno is 16 MHz, so a clock cycle is 62.5 nS (or 16 cycles per uS).

As you quoted above:

In SPI Slave mode, the control logic will sample the incoming signal of the SCK pin. To ensure
correct sampling of the clock signal, the minimum low and high periods should be:
Low periods: Longer than 2 CPU clock cycles.
High periods: Longer than 2 CPU clock cycles.

Now each bit is sampled on a low/high transition of the SCK, so we must need both a low and a high (and then back to a low next time and so on). Since the data sheet says "longer than 2 CPU clock cycles" let's say: 3 cycles (each). So, per bit, that is 6 cycles. And for the 8 bits it is 48 cycles.

48 * 62.5 nS is 3 uS. But the byte arrived in 2 uS!

So we physically can't clock data in that fast.

If we decide to live dangerously and hope that 2 cycles per high/low is enough, then that is 32 cycles per byte. That takes 2 uS. Well, that should squeeze it in, although then we have to do something with that byte before the next one comes (I know the port is buffered but that doesn't help if, in the long run, we can't empty the buffer at the rate at which data arrives).

According to page 15 the minimum time to respond to interrupts is 4 cycles, and it takes another 4 cycles to leave the ISR. So that leaves 24 cycles inside the ISR for the data to be read from the SPI port, and saved to memory. I haven't added up the time for each instruction, possibly it could be done. But my test code, which I don't think is particularly inefficient, didn't work at full SPI clock speed.

Don't get me wrong, I'd love to do it. But if you can find a flaw in my logic, by all means let me know.

I re-ran the test on my web page.

This is what happens when you drop the master rate down from 1/8 to 1/4:

Helo,word!

Helo,word!

Helo,worl!

Helo,worl!

Helo,word!

Helo,word!

Helo,word!

It looks like it is getting the first 3 characters, so that sort-of works. But the time taken to store them isn't keeping up with more than 3. Since the SPI hardware is double-buffered for receiving that sounds about right. It got the first one, the double-buffering handled the next 2, and then it ran out of capacity.

Looking at the generated assembler for my ISR:

// SPI interrupt routine
ISR (SPI_STC_vect)
 118:	1f 92       	push	r1
 11a:	0f 92       	push	r0
 11c:	0f b6       	in	r0, 0x3f	; 63
 11e:	0f 92       	push	r0
 120:	11 24       	eor	r1, r1
 122:	8f 93       	push	r24
 124:	9f 93       	push	r25
 126:	ef 93       	push	r30
 128:	ff 93       	push	r31
{
  byte c = SPDR;  // grab byte from SPI Data Register
 12a:	9e b5       	in	r25, 0x2e	; 46
    
    // add to buffer if room
    if (pos < sizeof buf)
 12c:	80 91 76 01 	lds	r24, 0x0176
 130:	84 36       	cpi	r24, 0x64	; 100
 132:	78 f4       	brcc	.+30     	; 0x152 <__vector_17+0x3a>
      {
      buf [pos++] = c;
 134:	80 91 76 01 	lds	r24, 0x0176
 138:	e8 2f       	mov	r30, r24
 13a:	f0 e0       	ldi	r31, 0x00	; 0
 13c:	ee 5e       	subi	r30, 0xEE	; 238
 13e:	fe 4f       	sbci	r31, 0xFE	; 254
 140:	90 83       	st	Z, r25
 142:	8f 5f       	subi	r24, 0xFF	; 255
 144:	80 93 76 01 	sts	0x0176, r24
      
      // example: newline means time to process buffer
      if (c == '\n')
 148:	9a 30       	cpi	r25, 0x0A	; 10
 14a:	19 f4       	brne	.+6      	; 0x152 <__vector_17+0x3a>
        process_it = true;
 14c:	81 e0       	ldi	r24, 0x01	; 1
 14e:	80 93 77 01 	sts	0x0177, r24
        
      }  // end of room available
 
}  // end of interrupt routine SPI_STC_vect
 152:	ff 91       	pop	r31
 154:	ef 91       	pop	r30
 156:	9f 91       	pop	r25
 158:	8f 91       	pop	r24
 15a:	0f 90       	pop	r0
 15c:	0f be       	out	0x3f, r0	; 63
 15e:	0f 90       	pop	r0
 160:	1f 90       	pop	r1
 162:	18 95       	reti

I count about 32 clock cycles in there (plus the 4 to enter the interrupt and the 4 to leave) so although this doesn't do much more than store the data in an array, it is taking more clock cycles than we have to hand.

Hi!

This topic became very interesting. Thankyou for your replies! Thankyou Nick for letting me borrow your code. There are not many examples on AVR running as slave to be found. Sorry for not giving you credits.

I've tried the coding examples forwarded by CrossRoads. The result is still the same. Lots of $FF chars and less than 1% real data.

After some more research my theory is that the clock pulse in not high long enough to be sampled. The specification says more than two clock pulses, which is more than 125 nS @ 16 MHz. Looking at the oscilloscope output the clock pulse is high (>3V) in around 100 nS. Therefore the clock pulse i probably lost which causes the loss of data. So it looks like as if this is not possible to solve using a 16 MHz Arduino.

Here is the clock line:

Will the onboard LED on the Arduino board cause any problems? The LED is glowing when the master is driving the SCLK line.

Another question regarding the SS pin on the AVR when running as slave: The only pin that can be used is pin 10 and it is not configurable?

Since about 150 bytes of data is written every 5 seconds maybe adding some buffering logics could solve the task. Any ideas on how to do that?

Regards,
Torgil

Crap. Okay, I'm going to look into FIFO's then as way to receive larger bursts of data.
Hard to believe we can send data out faster than we can receive it. Makes you wonder how all the slave we talk to keep up, like the MAX7219/7221.

Since this seems to be a dead end without solution I'm thinking on another aproach. I'll start a new topic "SD Card SPI with two masters".

CrossRoads:
Hard to believe we can send data out faster than we can receive it. Makes you wonder how all the slave we talk to keep up, like the MAX7219/7221.

They probably have dedicated shift registers and buffers designed to handle the (probably small) amount of data they need to receive.

Makes you wonder how all the slave we talk to keep up, like the MAX7219/7221.

Most SPI slave chips are 100% hardware, when you use an AVR half the job is done in software.

AVRs are crap at being an SPI slave because they are not double buffered. As such the ISP (if you use one) has to service the SPI data reg within 1 SPI clock cycle, ie after the 8 bit is clocked in but before the 9th. At high speed this is not an option.

You will never do high speed using interrupts.

You should be able to do it with some tight polling, otherwise AFAIK you have to add a small delay when transmitting bytes.

Although the code in the early post did poll the SPI it then spent half an hour printing and dicking around, meanwhile probably 29 bytes had been received.


Rob

I found some large FIFOs at Newark, I have a schematic mostly done to use 74AC299 Universal Shift Register (I bought 20 of them a while back) to clock in the bit stream, transfer to the FIFO, then parallel read or serially shift out using 2nd 74AC299.
Need to work in clock edge counter next to capture bytes when SS is held low for burts of bytes.

Graynomad:
AVRs are crap at being an SPI slave because they are not double buffered.

From the Atmega328 datasheet, page 167:

The system is single buffered in the transmit direction and double buffered in the receive direction. This means that bytes to be transmitted cannot be written to the SPI Data Register before the entire shift cycle is completed. When receiving data, however, a received character must be read from the SPI Data Register before the next character has been completely shifted in. Otherwise, the first byte is lost.

So it's double buffered. But the rate (for this application) is still too high according to the specs.

Yep it is, for some reason I thought it didn't even have that buffer.

So in most apps it should be easy enough to respond in time if you're careful about reading the data and then dealing with it later. But in this case you may have a continuous data stream and never get a chance to display it. Even if it's not continuous you need to know when you have a break in the data to go off and do something else.

Bottom line is I think that this can't be done with a slow(ish) processor and fast data unless you can determine when there's a break in the data.

I'd spend $150 and get a Saleae logic analyzer, you'll be decoding the data in a few minutes with the right tool.


Rob

Graynomad:
You will never do high speed using interrupts.

You should be able to do it with some tight polling, otherwise AFAIK you have to add a small delay when transmitting bytes.

I was curious if that was true, so I did some careful measuring.


Test 1, using interrupts:

void isr () {
  PORTB = 0x20;      // LED on   
}

void setup() {
  pinMode(13, OUTPUT);      
  attachInterrupt(0, isr, FALLING);     
}

void loop() {
  PORTB = 0;     // LED off    
}

To reduce overhead, I used direct port manipulation to turn the LED on pin 13. I then pumped through gradually increasing square waves into D2 to see what would happen. The bottom line was that it took around 3.5 uS for the LED to turn on after the trailing edge on D2. Also, you couldn't handle a frequency much higher than 150 kHz before some edges were missed (ie. around 6.7 uS all-round response).

This was somewhat poorer response than I expected, as the datasheet says the processor responds to interrupts "in 4 clock cycles". However a bit of research shows that the internal interrupt routine is actually this:

SIGNAL(INT0_vect) {
 166:	1f 92       	push	r1
 168:	0f 92       	push	r0
 16a:	0f b6       	in	r0, 0x3f	; 63
 16c:	0f 92       	push	r0
 16e:	11 24       	eor	r1, r1
 170:	2f 93       	push	r18
 172:	3f 93       	push	r19
 174:	4f 93       	push	r20
 176:	5f 93       	push	r21
 178:	6f 93       	push	r22
 17a:	7f 93       	push	r23
 17c:	8f 93       	push	r24
 17e:	9f 93       	push	r25
 180:	af 93       	push	r26
 182:	bf 93       	push	r27
 184:	ef 93       	push	r30
 186:	ff 93       	push	r31
  if(intFunc[EXTERNAL_INT_0])
 188:	80 91 00 01 	lds	r24, 0x0100
 18c:	90 91 01 01 	lds	r25, 0x0101
 190:	89 2b       	or	r24, r25
 192:	29 f0       	breq	.+10     	; 0x19e <__vector_1+0x38>
    intFunc[EXTERNAL_INT_0]();
 194:	e0 91 00 01 	lds	r30, 0x0100
 198:	f0 91 01 01 	lds	r31, 0x0101
 19c:	09 95       	icall
}
 19e:	ff 91       	pop	r31
 1a0:	ef 91       	pop	r30
 1a2:	bf 91       	pop	r27
 1a4:	af 91       	pop	r26
 1a6:	9f 91       	pop	r25
 1a8:	8f 91       	pop	r24
 1aa:	7f 91       	pop	r23
 1ac:	6f 91       	pop	r22
 1ae:	5f 91       	pop	r21
 1b0:	4f 91       	pop	r20
 1b2:	3f 91       	pop	r19
 1b4:	2f 91       	pop	r18
 1b6:	0f 90       	pop	r0
 1b8:	0f be       	out	0x3f, r0	; 63
 1ba:	0f 90       	pop	r0
 1bc:	1f 90       	pop	r1
 1be:	18 95       	reti

Now adding up the clock cycles for those instructions (the push alone takes two cycles and there are 15 of them) it comes to 45, plus 4 to enter the interrupt, plus 3 for the JMP from the interrupt vector table. Then there is this in "my" interrupt routine:

void isr () {
  PORTB = 0x20;        
 100:	80 e2       	ldi	r24, 0x20	; 32
 102:	85 b9       	out	0x05, r24	; 5
}
 104:	08 95       	ret

That's another couple of cycles to turn on the LED. Total being 45 + 4 + 3 + 2 = 54 cycles. That accounts for 3.38 uS response time, which is pretty close to the measured one. Then there is all the stuff to exit interrupts (ret = 4 + 35 others) and that would account for another 2.4 uS.


Test 2, using a tight loop:

void setup() {
  pinMode(13, OUTPUT);      
  noInterrupts();
}

void loop() {
  while (true)
  {
    if (PIND & 0x04)
      PORTB = 0;
    else
      PORTB = 0x20;
  }
}

This time I measured something like 480 nS for the LED to light, which indicates it was noticed within about 7 clock cycles. The relevant code is:

  while (true)
  {
    if (PIND & 0x04)
 102:	4a 9b       	sbis	0x09, 2	; 9
 104:	02 c0       	rjmp	.+4      	; 0x10a <loop+0xa>
      PORTB = 0;
 106:	15 b8       	out	0x05, r1	; 5
 108:	fc cf       	rjmp	.-8      	; 0x102 <loop+0x2>
    else
      PORTB = 0x20;
 10a:	85 b9       	out	0x05, r24	; 5
 10c:	fa cf       	rjmp	.-12     	; 0x102 <loop+0x2>

Clearly there are less instructions there than in the interrupt routine, so taking something like 6 cycles to notice the pin change would be about right.


So I'll have to agree, whilst interrupts are pretty useful, they can't respond as fast as a tight loop. But on the other hand, if you are using a tight loop you aren't doing anything else useful, so it would depend on the application somewhat.

Yep, "responding to interrupts" and actually doing some useful work are different things.

I argued this exact scenario the other day on another thread but just guestimated the times, thanks for quantifying it.

so it would depend on the application somewhat.

Always trade offs to decide about, that's why we get the big bucks :slight_smile:


Rob

Thankyou for investigating the difference in overhead between interrupt and polling. Could it be the compiler causing some of the overhead? It seems like part of the ISR is connected to restoring stack variables.

If anyone is still interested in where this thread started I think that I have now got an explanation to all the $FF's recieved by the Arduino when eavesdropping the SPI line. After studying SD card library I've noticed that there is a card initialization sequence run when the card is accessed. This sequence is run on a low clock speed, normally 1/128 or 125 kHz and sends 10 $FF bytes. My guess is that this is done by the controller more than once during the logging sequence that occurs every 5 secs.

I've also been looking into FIFO's and shift registers, but to me it seems like a much more simple solution to add some logics to make it possible for two SPI masters to access the same SD card. Now I need a SD card holder to try this aproach. Regarding shift register + FIFO I guess the idea is to use parallell input to the arduino? It would then also be necessary to add some logic to divide the clock/8 and reset the shift register on cs high-low transition?