Go Down

Topic: Support for the second SPI bus on the Zero (Read 303 times) previous topic - next topic

scswift

I'd expected that when the Zero was released, it would feature support for the SPI port on pins 10-13.  Variant.h seems to hint at this capability:

Code: [Select]

/*
 * SPI Interfaces
 */
#define SPI_INTERFACES_COUNT 1


I know the Due has extended SPI functions where you can initialize multiple devices with different chip select pins, and I assumed that this would be used with the Zero to access multiple SPI devices on individual SPI busses via their chip select pins, but it seems this functionality wasn't included in the final release.

Will this functionality be added soon?  I noticed the SPI bus speed on the Zero is limited to 12MHz, so the ability to use two busses simultaneously would double the available bandwidth if say, you wanted to read from an SD card and write to a display at the same time.

Obviously one doesn't need the SPI library to use these pins, but it would have been nice to have been able to use the helper functions to access them, and without that support one can't count on any libraries, like the FAT or a TFT library, supporting the use of those pins. 

ddurkee0

How do I use the SPI library on the Zero?
I added #include <SPI.h> to a new sketch.

and get a long list of errors:
\\dc-01\userhome\dave\My Documents\Arduino\libraries\SPI/SPI.h: In static member function 'static byte SPIClass::transfer(byte)':
\\dc-01\userhome\dave\My Documents\Arduino\libraries\SPI/SPI.h:56:3: error: 'SPDR' was not declared in this scope
  SPDR = _data;
  ^

scswift

I don't think the SPI library is supposed to be in your libraries folder, it's in the Arduino folder or the hidden Arduino folder in your user directory in Windows.  You're probably having issues because that's not the newest one that came with the update for the Zero.

Paul Stoffregen

#3
Jul 09, 2015, 06:06 pm Last Edit: Jul 09, 2015, 06:10 pm by Paul Stoffregen
the SPI bus speed on the Zero is limited to 12MHz, so the ability to use two busses simultaneously would double the available bandwidth if say, you wanted to read from an SD card and write to a display at the same time.
I am curious how you imagine simultaneous 12 Mbit/sec SD card and display access will work?

Let's imagine you get your wish, so the SPI library provides both "SPI" and "SPI1".  Let's also imagine either your display library or the Arduino SD library has some option to use "SPI1" instead of only "SPI".

The problem is SPI.transfer(data), and SPI1.transfer(data), transmit the data you give them to the MOSI pin and return whatever data arrived on the MISO during that transfer.  You can't do anything else during that time.

For example, with the SD library, suppose to call myfile.write("a fairly lengthy string").  If your current position in the file is near the end of a 512 byte sector, it will write the first portion of the string to the sector buffer, and then write the sector to disk.  The write operation involves several SPI.transfer() to make the request.  Then a variable number of SPI.transfer() are used to wait for the card to be read to accept the data.  Then 514 SPI.transfer() are used to write the data and a checksum.  Then the SD library waits of the card to complete the write, which can take quite a bit of time.  Any number of SPI.transfer() are called before the card returns a status code indicating completion.  After the write, the next sector from the card must be read.  But if this was the last sector in a FAT filesystem allocation cluster, then first the FAT table must be read to learn the location of the next cluster.  Reading a sector is similar, involving a small fixed number of SPI.transfer() to send the command, a small but variable number to wait for the card to be ready to give data, then 514 to receive the sector and checksum.  Then the SD library knows the next cluster's location, so it can be read into the buffer and the remainder of the string can be stored in the first part of that buffer.

The point is those SPI.transfer() calls are very deep within the SD library, below the filesystem and FAT layers.

How will your display library manage to call SPI1.transfer() while the SD library is busy, to simultaneously refresh your screen?

Even if your display code could overcome this issue, how are you going to compose more data to put onto your display while the SD library is busy writing and reading sectors?

Paul Stoffregen

#4
Jul 09, 2015, 06:29 pm Last Edit: Jul 09, 2015, 06:36 pm by Paul Stoffregen
You might imagine somehow using interrupts, but that is also problematic.

Even if you somehow managed to massively restructure the SD library or a display library to use interrupts and allow the other library to run simultaneously, consider the timing.  At 12 Mbit/sec and the CPU at 48 MHz, each bit is exactly 4 clock cycles, or 32 cycles/byte.

There's no FIFO in Zero's SPI hardware, so you have to process 1 byte at a time.  Cortex-M0+ has 15 cycles interrupt latency with zero wait state memory, and at least 12 cycles return latency.  That's 27 of the 32 cycles right there, if your code runs from the zero-wait RAM.  Arduino Zero's flash memory has 1 wait state, so even an empty interrupt can't meet the timing requirements to keep continuous 12 Mbit/sec data flow on just one of the SPI ports.  You'd have to run a LOT faster to allow the main program enough time to also sustain data flow, and support the extra overhead & complexity to make the interrupt one state driven instead of simple code.  Cortex-M0+ simply doesn't have this level performance.

DMA transfers might help, but they have high setup overhead, and they can't handle data-dependent transfers which the SD library requires while waiting for command acceptance and write completion.  Maybe a display library could benefit, to move the frame buffer out to the display.

DMA support in the SPI library, and usage of such feature in a display library is asking for a LOT more than simply providing SPI library support for both ports.

Supporting both SPI & SPI1 of course would be nice for people who wish to use the other port, especially on your board that didn't bring the main SPI port out easily accessible pins.  Likewise, libraries like SD would need to have a way to select which port they use.  Currently, they're almost all hard-coded for "SPI".  Those would at least allow easily using the other port.

Just don't fool yourself into thinking this will be useful for somehow simultaneously using 12 Mbit/sec on both ports.  None of the existing software is structured for simultaneously access, and the 48 MHz Cortex-M0+ CPU is far too slow to achieve that with either interrupts or software polling.

scswift

#5
Jul 09, 2015, 07:28 pm Last Edit: Jul 09, 2015, 07:43 pm by scswift
You may be right that it's not possible with how the SPI library functions. 

However, I'm pretty sure on the Atmega I could initiate a byte transfer and then go off to do other things and read the byte received on the next go-round before initiating the next byte transfer.

If that were possible on the Zero it would mean I could theoretically initiate a transfer on one bus, initiate a transfer on another bus, then go off and do something else, and then at my leisure read the returned bytes for each of the transfers.

So instead of initiating transfer and having to wait 8 cycles or whatever for the transfer to complete and being able to do nothing during that time, I could use those 8 cycles for other things... including starting another transfer on the second SPI bus, effectively giving me twice the throughput.

I mean in reality it might be less than twice the throughput, but it would still probably be more than what could be achieved with a single bus.  And even if it were the same as what could be done with a single bus, I would imagine the CPU would at least be freed up to do more stuff.

I mean having the CPU just stop after you begin each byte transfer when it doesn't have to doesn't seem optimal.  Shouldn't the SPI transfer function return the byte received after the PREVIOUS transfer?

I could be mistaken about all this of course.  It's been a while since I wrote any SPI code, and I believe I used one of the USARTs in SPI mode when I did this stuff on the Atmega, so maybe there were some differences. 

Paul Stoffregen

#6
Jul 09, 2015, 11:45 pm Last Edit: Jul 09, 2015, 11:45 pm by Paul Stoffregen
So instead of initiating transfer and having to wait 8 cycles or whatever for the transfer to complete and being able to do nothing during that time, I could use those 8 cycles for other things... including starting another transfer on the second SPI bus, effectively giving me twice the throughput.

I mean in reality it might be less than twice the throughput, but it would still probably be more than what could be achieved with a single bus.
I am confident if you actually try on this Uno or Zero with the SD library and a display, you'll achieve much less than the speed of either library acting alone.


Quote
And even if it were the same as what could be done with a single bus, I would imagine the CPU would at least be freed up to do more stuff.
In theory with an extremely fast CPU, yes.

In practice with 8 bit AVR or 32 bit ARM Cortex-M0+, with 8 or 12 Mbit/sec SPI, the CPU speed is far too slow.  Especially for 2 general purpose libraries like SD and a display, just the function exit-entry-exit-entry to get between the 2 unrelated code bases will eat up nearly all the CPU time.

Of course, you can try to prove me wrong!  Just a small matter of programming, right?

scswift

Quote
In theory with an extremely fast CPU, yes.
How do you mean?

If It takes W cycles to enter an interrupt, X cycles to begin an SPI transfer, Y cycles to complete an SPI transfer and Z cycles to exit an interrupt, and I skip Y by starting the transfer but not waiting for it to finish, then at the very least, the cost to exit that interrupt should be reduced by however many cycles it takes to transmit 8 bits at 12mhz on a 48mhz processor.  Which is probably a lot.

Let's see, 48mhz / 12mhz = 4... And tansmitting 8 bits at 12mhz takes 16 cycles.  Times our factor of 4...  That's 64 CPU cycles wasted waiting for that byte to finish transmitting if we stick around waiting for the next byte to arrive! 

Am I wrong?


Quote
In practice with 8 bit AVR or 32 bit ARM Cortex-M0+, with 8 or 12 Mbit/sec SPI, the CPU speed is far too slow.  Especially for 2 general purpose libraries like SD and a display, just the function exit-entry-exit-entry to get between the 2 unrelated code bases will eat up nearly all the CPU time.
Yes, if you transfer one byte at a time.  But the SDFat Library doesn't do that.

I mean you're right... If you transfer a single byte at a time with an interrupt, you're going to waste boatloads of time entering and exiting that interrupt whether you use one or two SPI busses to transfer the data. 

And I don't claim to know how to solve that, if you transfer one byte at a time.

But, if one were to write an SPI function that could transfer blocks of data at a time, then I think you could interleave the transfers, initiating the transfer of one byte for one bus, and then initiating the transfer for the second bus, and then repeating, and only exiting the interrupt when the transfer completes.

Maybe?

I know the WaveHC lib and the SDFat lib sped up their transfers a great deal by transferring 512 bytes at a time, and I know that when I was working with some others to make SPI transfers on the Atmega as fast as possible there were a whole lot of NOPs we had to insert in the loop when we cheated and didn't use the transfer complete bit to get that last bit of speed out.  So it stands to reason that there ought to be enough spare cycles in the loop to interleave two transfers.  But I have no idea how easy it would be to do this.


Quote
Of course, you can try to prove me wrong!  Just a small matter of programming, right?
Hey, you said you were curious how I thought it could work. I'm not saying you're wrong. :)


And normally I'd be up for the challenge, but I've got my hands full at the moment designing PCBs.  I'm just putting some ideas out there!

Paul Stoffregen

Let's see, 48mhz / 12mhz = 4... And tansmitting 8 bits at 12mhz takes 16 cycles.  Times our factor of 4...  That's 64 CPU cycles wasted waiting for that byte to finish transmitting if we stick around waiting for the next byte to arrive!  

Am I wrong?
Well, the idea is right, but yeah, your analysis is off by a factor of 2.  Transmitting 8 bits at 12 Mbit/sec takes 32 CPU cycles, not 64.

Likewise, most of these software ideas you've mentioned would be good ideas if the CPU were much faster and the SPI peripherals had substantial FIFOs.

Maybe the concept of transferring a large block could work with the DMA engine.



Quote
And normally I'd be up for the challenge, but I've got my hands full at the moment designing PCBs.  I'm just putting some ideas out there!
Maybe after you're start shipping your Neutrino boards you'll find some time to contribute to the Arduino software or libraries?


scswift

Maybe after you're start shipping your Neutrino boards you'll find some time to contribute to the Arduino software or libraries?
I hope so.  I'm always happy to share code, and I've contributed to open source in the past; but it was mostly stuff for making games on the PC; sprite libraries, particle systems, shadow and terrain libraries, view frustum culling and dynamic LOD adjustment using octrees, etc.  I spent 20+ years doing that before I got into electronics.

Go Up