Using DIO for super high-speed multi-Arduino bus communication

dlloyd · March 23, 2015, 1:43am

Could test the decoder feature using CS pins 4, 10, and 52 plus one extra digital output all connected to a fast 1 of 16 decoder. This output could be connected to the upper select input of the decoder allowing you to choose between 2 groups of 8 devices.

system · March 23, 2015, 10:12am

Silly question but when you say you got DMA working at 42 MHz, I understand that is half the CPU Clock speed. What do you mean exactly ?
How did you determine that the DMA was executing at that speed ? I've never tried DMA and am just wondering how you measure that ?

MarkT · March 23, 2015, 10:45am

The Due can do 16 bit SPI transfers so the actual DMA rate would only need to be 42/16 = 2.625MHz

ShapeShifter · March 23, 2015, 12:28pm

MercuriThunder:
I haven't attempted to get more than one byte through. I'm assuming I'll have gaps unless I use DMA transfers, or some sort of really optimized receive routine.

DMA will give you the highest transfer rate. Or you could go with a small ISR that sends or stores the next piece of data, but that won't give as much throughput as DMA. A tight little polling loop can give better throughput than an ISR since there is no context switch overhead, but then you'll be doing nothing else while transferring data.

Sending 16 bits per DMA or ISR or poll cycle will give you better rates than sending 8 bits at a time, because it lowers the per bit overhead.

To me, the SPI clock rate is fairly meaningless: 42 MHz means nothing if there are huge gaps between each byte or word. All it does is makes the wiring more complex and touchy. You need to figure out the actual sustained data rate you can manage, and then set the SPI clock rate to 8 or 16 times that rate. Running the clock faster than that is just a waste.

Having access to the hardware CS lines is nice, but not at all essential, especially for the master. Any GPIO bit will work just as well. In fact, sometimes it's easier to just control the CS yourself rather than work around the way the SPI hardware wants to control it. The exception is on the shave side: having access to a hardware CS input makes it much easier, but then you only need one.

system · March 23, 2015, 1:17pm

I got SPI master and slave communication working between 2 Dues at a balmy 42 MHz

To me, the SPI clock rate is fairly meaningless: 42 MHz means nothing if there are huge gaps between each byte or word. All it does is makes the wiring more complex and touchy. You need to figure out the actual sustained data rate you can manage, and then set the SPI clock rate to 8 or 16 times that rate. Running the clock faster than that is just a waste.

I was wondering how you measured the data transfer rate.

system · March 23, 2015, 1:19pm

Yessir, that's why we plan to implement DMA... I don't want gaps.

42 MHz was measured via scope on the SCLK signal between Dues during successful communication. When it comes time for data transfer rates after DMA and interrupts are implemented, I will post Mbps.

ShapeShifter · March 23, 2015, 1:21pm

raschemmel:
I was wondering how you measured the data transfer rate.

Are you asking me or the OP?

system · March 23, 2015, 2:22pm

Are you asking me or the OP?

I was asking the OP, who replied in Reply#25 that he used the SCLK signal. I don't know what the correlation is between SCLK and a data transfer. Without knowing that , being that it is serial communication, I don't see how it could transfer one byte per clock pulse so I would tend to think
it is one bit per clock pulse. The OP didn't post any data numbers or calculations so I still don't know how that data transfer rate determination was made. Maybe you can shed some light on that.
SPI requires sending of more than just the data , as seen from this snippet:

                                  digitalWrite(SSelectPin, LOW);
				 SPI.transfer(address); // send command byte
				 SPI.transfer(Value); // send Value (0~255)
				 digitalWrite(SSelectPin, HIGH);

After the chip select (SS) pin goes LOW, a command byte is sent (at least in this case of a digital pot) and then the data value is sent. With a DMA transfer I would think there would be a command byte for a Write operation but that's just a guess. How does the SCLK correlate to data
bytes transferred ? That seems to be the missing piece of information.

ShapeShifter · March 23, 2015, 3:03pm

What gets sent, and what overhead is associated with it, is up to the protocol agreement between the master and slave. SPI does not define any sort of protocol, it is simply a synchronous way of sending a bitstream between two devices. The actual protocol (meaning of the bytes) is completely open and generally defined by the particular slave being accessed (it's generally assumed that the master will bow to the needs of the slave device.)

At it's most basic level, SPI is a pair of shift registers, one in the master, and one in the slave. The serial data lines are hooked up so that as the master is shifting bits out to the slave, the slave is also shifting bits into the master. The master provides a clock to keep everything in sync. One data bit is transmitted in each direction with each complete clock cycle.

The transfer starts out by the master writing a data byte into its shift register, while the slave writes a data byte into its shift register. The master then cycles the clock 8 times so that the master's data goes out the master and in the slave (MOSI - Master Out Slave In) and the slave's data goes out the slave and into the master (MISO - Master In Slave Out.) After 8 clock cycles, the master now has a copy of the slave's data, and the slave has a copy of the master's data. Each side reads out their data byte, and then puts a new byte into the shift register so that another transfer can happen.

There is also a chip select (CS) line that is controlled by the master and read by the slave. Generally, when the CS line is not active, the slave ignores any SCK and MOSI data, and sets its MISO to high impedance (tri-state.) This way, the SPI bus can be shared by multiple slaves and only one will use it at a time (the one which has an active CS line.)

The diagram shows a very simple case. Usually, the hardware is double buffered so that each shift register has an input data register and output data register. Data can be written to the output data register while a byte is currently being transmitted. When the transmission is complete, the incoming data is copied to the input data register, and if something is in the output data register it is copied to the shift register and another transfer is started. Then, the input data register can be read to get the last received value, even while a transfer is in process.

More sophisticated devices have hardware FIFOs that are several bytes deep instead of just single double buffer registers. Others add DMA ability. Some devices can transfer 16 bits at a time, others are limited to only 8.

But basically, the clock rate is the rate that bits are shifted between the two devices. But if the devices can't get data into and out of the SPI hardware fast enough, having a high clock rate doesn't by anything. So you get the individual bytes shifted between devices really fast, but what's the benefit if the hardware then spends a lot of time waiting for the next byte? It's like racing at 100 mph between timed traffic lights, only to spend most of your time waiting at the next reg light, when driving at a steady 30 mph will let you sail through nothing but green lights.

system · March 23, 2015, 3:49pm

It's like racing at 100 mph between timed traffic lights, only to spend most of your time waiting at the next reg light, when driving at a steady 30 mph will let you sail through nothing but green lights.

Where's the fun in that ? ;D

How do you read the Slave's data after the Master has received a copy of it and has it in it's register ? Do you know any links to learn the finer points of SPI enough to play around with it ? I guess I'm wondering where does the data go ? In the example you gave in the Slave a DMA Memory chip ?
In the OP's original post he says he wants two DUEs to "communicate" (whatever that means) and makes no mention of DMA memory chips so if the data is being transferred at 42 Mhz, where is it going ? How can the slave receive more data (another byte) if it has not stored the byte it just received and if it did, where did it store it and how (assuming it is busy doing nothing but receiving bytes from the Master) . Are these fair questions ? I didn't see anything in the OP's post that suggests he tried to transfer more than one byte so I guess the question becomes what happens to that byte ? Am I asking too many questions ? I found this tutorial:
SPI TUTORIAL

system · March 23, 2015, 4:35pm

Shapeshifter - You know your stuff, man. No wonder you're a God Member. You are absolutely right on all accounts, and your explanation is succinct. Kudos.

raschemmel - Using a DMA buffer, you can essentially point the master's SPI hardware to a chunk of the master's memory (say a 64kB block), and then point the slave's SPI hardware to a chunk of the slave's memory (to receive said 64kB block), then tell it to go, then BAM... Approximately 512k clock cycles later (8 bits/cycles per byte x 64kB of data), the slave's memory should now have a copy of the master's data. That is what I mean by "communicate". That block can contain a message with any protocol I choose, a report, ANYTHING. It doesn't matter.

What's great is the slave is also communicating with the master simultaneously, so that data exchange is full-duplex.

ShapeShifter · March 23, 2015, 4:37pm

raschemmel:
Where's the fun in that ? ;D

True. The point is that you don't need a Ferrari when a Yugo will get the job done. However, want is an entirely different matter.

How do you read the Slave's data after the Master has received a copy of it and has it in it's register ?

It's all a matter of the way the code is set up to react to the new data. Generally, there is a bit in a SPI device register that indicates when a new byte is available:

Code can loop around and check this bit. When it's set, it can read the data register and do whatever you want with the data byte.
Usually, that register bit can also trigger an interrupt when it gets set. In that case, you could set up an interrupt service routine (ISR) that gets triggered each time the bit is set. That way, every time a byte comes in, the ISR is triggered, the code stops what it's doing and transfers to the ISR, the ISR reads the bytes and stores it away, then the ISR returns so the code can continue where it left off. The same idea can apply to sending the data by writing the next byte to the output data register. Usually, one can combine the reading and writing operations into one ISR because by definition, each write transfer is also a read transfer.
Sometimes, that register bit can also trigger a DMA operation, so that the DMA hardware pauses the code, reads the byte and stores it directly into memory, and then resumes the code

In the example you gave in the Slave a DMA Memory chip ?

It's not really a DMA memory chip, it a DMA controller device in the processor. DMA means Direct Memory Access, and it's a special controller in some chips that can access device registers and RAM directly. The code sets up the DMA controller, and then the DMA hardware takes over after that.

Typically, you set up the DMA controller with a trigger, a read address, write address, and a number of bytes. The read address and write address can usually be set to so that they can automatically increment. In the case of a typical DMA controller being used with a SPI device, you would set it up so that it is triggered by the SPI input data register full signal, and also set it up to read from the SPI input data register without incrementing the read address, write to a buffer in memory with incrementing the address, and generate an interrupt when the required number of bytes have been transferred. That way, as each byte comes in and the SPI device signals that there is something in the input data register, the DMA hardware takes over and reads the byte from the input data register, writes it to the data buffer, increments the data buffer address and decrements the remaining byte counter. This is all done in hardware with no further code intervention. Then, when the remaining byte counter reaches zero, the DMA controller triggers an interrupt, and the ISR that your wrote takes control and does whatever needs to be done to the buffer of data that the DMA controller copied into place.

The same idea can be set up with the output side of the SPI controller. In this case, the input part if the DMA controller would be set up to read from a buffer, incrementing the address on each read, and the output part of the controller would be set to write to the SPI output data register without incrementing. In this way, the data buffer is written to the output data register one byte at a time, every time that the SPI hardware indicates that the output data register is empty.

Polling the ready bits and reading/writing as necessary is the simplest way to do it. It's also the slowest because you aren't doing anything else at the same time. Using interrupts is trickier to set up, but can let you do other things while the transfer is taking place. Using DMA is the fastest, and lets you do the most while the transfer is happening, but is also the trickiest to set up properly (and can be a nightmare to debug if things go wrong.)

In the OP's original post he says he wants two DUEs to "communicate" (whatever that means) and makes no mention of DMA memory chips so if the data is being transferred at 42 Mhz, where is it going ? How can the slave receive more data (another byte) if it has not stored the byte it just received and if it did, where did it store it and how (assuming it is busy doing nothing but receiving bytes from the Master) . Are these fair questions ?

They are all very fair questions. By "communicate" the OP has only indicated that he wants to move data from one Due to another, but hasn't given any other details. And at this point, he's only sent a single data byte. I don't know that the slave has done anything with it yet, but whatever method he uses to retrieve that data from the SPI hardware and process it, it will take some time, and that will slow down the effective throughput of the communications channel. In addition, the master must gather the data and and send it out, which also takes some time, which will also slow down the effective throughput.

So, in determining how fast data can be sent from one side to the other, the limiting factors are how fast can the master gather and send the data, how fast can the hardware send the data from one device to the other, and how fast can the slave accept and store/process the data. The slowest link in the chain will set the overall data throughput rate. Generally, with high speed SPI devices, the SPI clock rate is NOT the limiting factor in the transfer. It's all well and good that he can clock the SPI devices at 42 MHz, but there is no way that both the master and slave will be able to transfer the data in and out of the SPI devices at that rate.

A couple interesting links for further reading:

ShapeShifter · March 23, 2015, 4:41pm

MercuriThunder:
Shapeshifter - You know your stuff, man.

Well, I've been doing it for a couple years, and have been making a pretty good living at it.

Ok, not really a couple years... more like a whole lotta years. And I guess it's not as good a living as I would like, as I'm ready to retire, but the bank doesn't agree with me...

"I'm retired. No, I'm still working, I'm just tired again!" :

system · March 23, 2015, 4:49pm

Thanks to both of you for such a thorough explanation (especially Shapeshifter). I will need to read your replies several times to completely assimilate the information but now I have a much better understanding of what is going on and I will try my own experiments with ATmega328s (of which I have many). Mastering SPI sounds like something that should be at the top of my job skills home study To Do list. I'll have a go at doing my own little experiment (which will be only a shadow of the OP's ) to see if I can master it. Thanks again for taking the time to spell it out for me !

system · March 23, 2015, 5:03pm

raschemmel - Hook the dude up with some karma.

ShapeShifter · March 23, 2015, 5:20pm

raschemmel:
Mastering SPI sounds like something that should be at the top of my job skills home study To Do list.

Yes, it's a very valuable technique to have in your repertoire, even if you don't get into into the interrupts and DMA side of it and just use the shiftin() and shiftout() functions (which internally do the polling loops I mentioned.)

So many devices use SPI interfaces, like the digital pot you mentioned earlier, that it's hard to get away from it at times. It's an especially valuable technique that can be used with shift registers to easily and cheaply give you more digital input or output pins -- in that case the input data register or output data register are actually pins on the shift register chip. There's lots of tutorials that talk about using shirt registers, but many of them don't say that what you're actually doing is a special form of SPI.

Some people will mention I2C as being better than SPI because it uses less pins: only two pins for any number of devices, as opposed to SPI which needs three shared pins plus one dedicated CS for each slave device. But I don't like I2C nearly as much as SPI: communications are more complicated, the wiring is more finicky, and it is prone to errors that can lock up the bus preventing further communications until special steps are taken to reset things. SPI, on the other hand, is pretty much bullet proof, which in my mind, more than justifies using a few extra pins.

system · March 23, 2015, 6:07pm

raschemmel - Hook the dude up with some karma.

DONE.
Thanks for reminding me.

ShapeShifter · March 23, 2015, 6:37pm

Why thank you, thank you very much! :-*

system · March 24, 2015, 1:25am

@Shapeshifter,
I've read several SPI tutorials and I am stumped. I have no problem using a SPI based digital pot because it has a command register at address 0. Also, since it is a slave and not a uP, it is not running any code. Where I am stumped is figuring out how to send data from a single ATmega328 SPI Master to multiple ATmega328 Slaves because the Slave are uP , not digital pots and have to run code. I can't reconcile the Slave code with the Master code when both are processors. As a starting experiment I just want to send a 16 bit AnalogRead value from a Slave to a Master. I want the Master to be able to receive analogRead values from multiple Slaves. I found this code for sending a 16 bit value as two bytes in the tutorial I linked in an earlier post in this thread:

 // put your main code here, to run repeatedly:
      int readStuff(void)
      {
      SPI.beginTransaction(SPISettings(12000000, MSBFIRST, SPI_MODE0));  // gain control of SPI bus
      digitalWrite(10, LOW);         // assert chip select
      SPI.transfer(0x74);            // send 16 bit command
      SPI.transfer(0xA2);
      byte b1 = SPI.transfer(0);     // read 16 bits of data
      byte b2 = SPI.transfer(0);
      digitalWrite(10, HIGH);        // deassert chip select
      SPI.endTransaction();          // release the SPI bus
      return (int16_t)((b1 << 8) | b2);
      }

This code was intended to be sent by a Master not a Slave so I am baffled as to how I could use something similar to send a 16-bit value from a Slave to a Master if both are ATmega328s.
I googled the subject and there seems to be a shortage of tutorials about using SPI to control multiple UNOs (or ATmega328s) for the purpose of dataacquisition where you have multiple slaves, each with 6 analog inputs (and digital inputs) and they are reading sensors or individual status bits and sending them to the one Master. I looked up "Shiftout" and I couldn't find any examples where it is used for SPI transfers between arduinos.

Any suggestions ?

ShapeShifter · March 24, 2015, 11:19am

I'm not surprised that you are having trouble finding examples. A vast majority of the time that SPI is used, the microprocessor is the master, and it is talking to some hardware device(s). The Arduino functions shiftin() and shiftout() are strictly master functions, and are not applicable for slaves.

In my professional experience (36 years worth) I have implemented many SPI masters, but only a couple SPI slaves. (Same for I2C.) When I write my embedded code, I don't have the luxury of pre-defined functions or libraries (other than ones i have written in the past) so I get down and dirty with the hardware, setting up device hardware registers and fielding interrupts. (You need to spend a lot of time reading data sheets and reference manuals when doing this!)

The problem with setting up a SPI slave is that you have to be ready for data to come in at any time, just like with the serial port. The Serial library takes care of that for you, by setting up a received data ready interrupt, catching that interrupt to read the incoming data, and putting that data into a circular receive buffer. Then, your code can poll for data more or less at its leisure by calling Serial.available() to see if there is any data in the buffer waiting to be processed.

Something similar is necessary for a SPI slave implementation. Take a look at the HardwareSerial implementation in the Arduino library for an example of what is done. But beware, it's mostly a lot of cryptic hardware register manipulation (the part I love writing I tend to get bored writing the high level stuff and much prefer the low level driver type stuff.) To find the code, starting at the Arduino program folder, it's in hardware/arduino/cores/avr/arduino.

Both you and MercuriThunder are facing an uphill battle implementing a SPI slave. The Arduino libraries don't support it. Maybe there is a library out there that does? If not, it will be necessary to write code similar to the HardwareSerial library code.

Topic		Replies	Views
SPI Communication issues Mega - nano General Guidance	32	277	December 11, 2025
SPI Full Duplex (Professional Code Error) Networking, Protocols, and Devices	28	959	April 26, 2024
SPI, Master receiving data from the Slave, how to? Networking, Protocols, and Devices	16	12504	May 6, 2021
SPI not in sync General Guidance	48	4127	May 5, 2021
Multiple node mcu using spi General Guidance	118	2627	February 24, 2024

Using DIO for super high-speed multi-Arduino bus communication

Related topics