Go Down

Topic: Measuring Throughput, Data Transfer Rate, and Memory Bandwidth on SRAM (Read 2665 times) previous topic - next topic

raygun3000

I'm currently using a 23K256 sram chip on an UNO trying to figure out how to measure throughput,  theoretical data transfer rate vs actual tested data transfer rate, and bandwidth computations through some code tests. Examples would be nice!

For references:
http://ww1.microchip.com/downloads/en/devicedoc/22100f.pdf      <===== Sram data sheet

I'm guessing ideal throughput goes something like:

                                                                       SPI clock speed (Hz)
 Readings per second =   -------------------------------------------------------------------------------
                                     ( 8 * (1 byte {instruction} + 2 byte {address} + X byte {data bytes})


Would theoretical data transfer rate be something like:
         
                                               Bytes sent to SRAM
 X bytes per second =  ---------------------------------------------
                                  Time it takes to complete write and read

For memory bandwidth computations...not too sure.

Lucario448

In synchronous (clock signal involved) serial communications, "raw" data rate is always determined by the clock's frequency; so for an Arduino Uno, the theoretical maximum is 8 Mbit/s or roughly 1 MB/s. Due to lack of DMA, actual data rate might be slightly lower (interrupt-driven I/O still has considerable overhead).

For sequential access, this might be the expected data rate; however, for random (single byte) access I'll say a quarter of that (due to the initial parameters per transaction). How ironic, don't you think?


About "bandwidth"... it's hard to tell outside of modulated or multiplexed (shared) communications; in those cases, I think it's determined by the number of bits transfered at a given clock period. Thus, since it's serial, it's 1 bit wide.

In telecommunication services (like internet), connections are usually shared by the same "channel". A cellphone tower or border router of your ISP, has certain throughput; and the "slice" the provider gives you of that throughput (aka connection speeds of your plan), effectively can be called "bandwidth".

In summary, bandwidth is the partial amount of throughput allocated to a particular connection; but because the whole SPI port is dedicated only to the SRAM, it's difficult to think about this concept.

raygun3000

Thanks for the response Lucario!

In synchronous (clock signal involved) serial communications, "raw" data rate is always determined by the clock's frequency; so for an Arduino Uno, the theoretical maximum is 8 Mbit/s or roughly 1 MB/s. Due to lack of DMA, actual data rate might be slightly lower (interrupt-driven I/O still has considerable overhead).

I agree that since I'm not using the Uno's internal RAM, I'm losing some of the actual data rate but how could I test how much is lost during that process?

For sequential access, this might be the expected data rate; however, for random (single byte) access I'll say a quarter of that (due to the initial parameters per transaction). How ironic, don't you think?

I'm actually doing sequential access since I'm building essentially a data logger. :D


About "bandwidth"... it's hard to tell outside of modulated or multiplexed (shared) communications; in those cases, I think it's determined by the number of bits transfered at a given clock period. Thus, since it's serial, it's 1 bit wide.

In telecommunication services (like internet), connections are usually shared by the same "channel". A cellphone tower or border router of your ISP, has certain throughput; and the "slice" the provider gives you of that throughput (aka connection speeds of your plan), effectively can be called "bandwidth".

In summary, bandwidth is the partial amount of throughput allocated to a particular connection; but because the whole SPI port is dedicated only to the SRAM, it's difficult to think about this concept.

If I were to throw in another SPI device, would they share an equal amount of bandwidth or will one take more of a priority if its used more often? In short, how would bandwidth be calculated then?


Lucario448

I agree that since I'm not using the Uno's internal RAM, I'm losing some of the actual data rate but how could I test how much is lost during that process?
The best option is to capture the time before and after the transaction (use micros() and two unsigned long variables); but try to do a transfer with multiple lines of code, loops have their own implicit overhead as well. Don't forget to set the SPI's clock to 8 MHz, which is the fastest possible speed.

Unfortunately, the result of this test most likely will be far from accurate, there are still some other things that will bring the data rate down. For instance: the timed interrupt that makes millis() and micros() tick (triggers at 976.6 Hz or every 1.024 ms), and the fact that you have to call and return from a funcion for every byte you have to transfer (i.e. SPI.transfer()).
I guess it would yield, at best, a third of the theoretical rate.


The second absolute best way to test that (the first is coding in assembly), would be forgetting completely about the library, setting up and using the SPI port purely with registers (in Arduino/C language, those are like implicitly declared byte variables that changing their value directly affects the hardware of the microcontroller), disabling the previously mentioned timed interrupt and having an oscilloscope with a time scale of 1 or 5 microseconds per division.
The idea is to toggle a digital pin (by port manipulation, digitalWrite() is way too slow in comparison) just before and just after the data transfer, capture the resulting pulse with the scope and then measuring its width to determine the duration of the process (and thus the average data rate). This way you'll get a more accurate result, almost overhead-free.



I'm actually doing sequential access since I'm building essentially a data logger. :D
Good, then it should be fast enough.

The type of memory is not the bottleneck, static RAM can work way faster than (what I presume is) the maximum clock frequency of the SPI interface. Actually, the bottleneck is the Arduino itself; so much that even dynamic RAM outnumbers the highest throughput the Arduino can offer. Not mentioning the difference between maximum clock rates of the SRAM and the Arduino.



If I were to throw in another SPI device, would they share an equal amount of bandwidth or will one take more of a priority if its used more often? In short, how would bandwidth be calculated then?
That's a tough question.

Ideally, this "bandwidth" should be divided equally between devices; however, this only happens in multitasking environments. Because Arduino is not primarily multitask (or at least not in a "balanced" or "smart" way, unlike a computer OS), it will depend on the code you end up uploading.

Therefore, there's no definitive or punctual answer; it depends on the code.

raygun3000

Quote
The best option is to capture the time before and after the transaction (use micros() and two unsigned long variables); but try to do a transfer with multiple lines of code, loops have their own implicit overhead as well. Don't forget to set the SPI's clock to 8 MHz, which is the fastest possible speed.
I'm assuming 8MHz is for the 3.3V version with the Atmega328 processor but in any case, I've used SPI.begintransaction(....) to set the clock to its highest value (16MHz for a 5V Uno). :P

Do I have to capture to transaction of the entire process (writing then reading) or is it enough to just capture a write or a read? If its one or the other, I have a little snippet for the write portion but the reading portion is relatively the same process but I'd have to use a for loop to read out the data(Issue command to read, send starting address/data/# of bytes). The # of bytes is just so that I could read the data in a for loop since I don't think I can avoid not using one.

Just testing the write portion assuming I've declared each variable and commanded the sram to start writing, I'm essentially passing in X bytes to the variable data as shown below. Would this suffice?

//Start micros() timer here
void SequentialWrite(uint16_t address, byte *data, int amount)
{
  digitalWrite(ChipSelect,LOW);
  SPI.transfer(WRITE);                    //Start writing
  SPI.transfer(address >> 8 );             //Send first byte of address
  SPI.transfer(address);                  //Send last byte of address
  SPI.transfer(data,amount);              //Passing array name and size to write array of data
  digitalWrite(ChipSelect,HIGH);          //Stop writing
}
//End micros() timer here


Lucario448

I'm assuming 8MHz is for the 3.3V version with the Atmega328 processor
Nope, the hardware SPI port has a minimum prescaler of 2; so it is always, at most, half the CPU's clock frequency.

This means that for the 8 MHz version, 4 MHz is the maximum SPI speed. To reach the maximum of 20 MHz the SRAM's datasheet claims; you'll have to somehow overclock the poor microcontroller up to 40 MHz.


Do I have to capture to transaction of the entire process (writing then reading) or is it enough to just capture a write or a read?
One of those is enough, because (unlike flash or EEPROM memories) SRAM is equally fast in both operations.


but I'd have to use a for loop to read out the data(Issue command to read, send starting address/data/# of bytes). The # of bytes is just so that I could read the data in a for loop since I don't think I can avoid not using one.
For a more realistic (rather than idealistic) test, I guess you can use your functions as they are. Do the following:

  • Take the time before calling the function.
  • Call the function.
  • Take the time after calling the function.
  • Process the result.
  • Print it in whatever output stream you like (e.g. Serial).

In order to process (obtain) the result, you should follow this formula:

(nb * 1000000) / (cTime - pTime)
Where:
  • nb is the amount of bytes transfered (the same one inputted as the second parameter of the function/command), aka the payload size of the transaction.
  • cTime is the timestamp (in microseconds) just after calling the function.
  • pTime is the timestamp (in microseconds) just before calling the function.
  • The result is given in bytes per second (B/s).

raygun3000

Could I use SPI.begin() instead of: SPI.begintransaction(16000000,MSBFIRST,SPIMODE0)/SPI.endtransaction(...)? I'm not sure if SPI.begin() is using the max clock speed like SPI.begintransaction().

I know SPI.begintransaction() stops the SPI bus but would it be okay to just keep it running since I'm building a datalogger?
I'd like to throw in a SDcard that uses SPI as well and I know they could both share the port as long as I have a different chipselect for it.

Lucario448

Could I use SPI.begin() instead of: SPI.begintransaction(16000000,MSBFIRST,SPIMODE0)/SPI.endtransaction(...)? I'm not sure if SPI.begin() is using the max clock speed like SPI.begintransaction().
Better call both. SPI.begin() sets its corresponding pins, and SPI.beginTransaction() do the remaining settings.
If all SPI slaves are compatible with the exact same settings (i.e. clock's frequency, polarity and phase; being the combination of the last two also referred as the "SPI mode"), then beginTransaction() may be called only once; and all that will matter is making sure only a single CS line is pulled low at any given moment (otherwise data collision, and probably a clock-period long short circuit too, will occur in the MISO line).


I know SPI.begintransaction() stops the SPI bus
Wrong. It only sets some registers, but doesn't idle the bus.
As hardware is concerned, the bus becomes busy as long as SPI.transfer() or SPI.transfer16() is running; the rest of the time stays idle (according to the currently set clock polarity) until SPI.end() is called or another transfer is initiated.

beginTransaction and endTransaction do not make the bus busy (in a hardware perspective, although the first one can suddenly change the SCK state while idling), they only flag it as such for the program's (software) logic.
I pressume their purpose is to refuse changing the settings more than once before the end. When including multiple libraries for mutiple slaves, they for sure will share the base SPI library; so I suppose those functions are there to avoid conflicts between slaves that work with different settings from each other (remember: SPI is not 100% a standard).


but would it be okay to just keep it running since I'm building a datalogger?
I'd like to throw in a SDcard that uses SPI as well and I know they could both share the port as long as I have a different chipselect for it.
Yes; but again, only if both slaves are compatible with the same settings.

raygun3000

Thanks for all your input so far! :)

So far, I did a few runs and wanted to see if ~50Bytes/sec seems reasonable or not considering that ~888,888 bits/second is the max data transfer rate due to Arduino Uno capability.

Running 80 Bytes yields ~53 Bytes/sec
Running 1000 Bytes yields ~50 Bytes/sec

I'm trying to find a reasonable amount of Bytes to buffer before sending it off to the SRAM and then reading it off to the SD Card to maximize its potential considering the fact that an Uno only has 2kB internal SRAM. It seems like if I make a byte array buffer of size 1000 (byte data[1001] = {};), its already eating up most of my internal SRAM. In the end, I don't want to be calling the SRAM and SD card more than it needs to be.

Lucario448

I did a few runs and wanted to see if ~50Bytes/sec seems reasonable or not
Really? 50 B/s? Honestly, I've expected way more, even for random (single byte) transactions.
Are you sure it is set to at least 8 MHz?


considering that ~888,888 bits/second is the max data transfer rate due to Arduino Uno capability.
888888 bit/s = 109 KB/s. Now that seems more reasonable.



In the end, I don't want to be calling the SRAM and SD card more than it needs to be.
Is that why is so slow? You never told you've been making a double transfer.

raygun3000

Quote
Really? 50 B/s? Honestly, I've expected way more, even for random (single byte) transactions.
Are you sure it is set to at least 8 MHz?
Whoops. Yea I'm using SPI.beginTransaction(SPISettings(8000000,MSBFIRST,SPI_MODE0)); since I'll eventually be using a 3.3V AtMega328P MCU that only has 8Mhz clock speed.
I accidentally put it after a couple other statements which happens to be loops...  :o
It turns out to be ~26700 Bytes/second. Only a fourth of the maximum transfer rate.
I do have somewhat long wires which may interfere with the transfer rate.


Quote
Is that why is so slow? You never told you've been making a double transfer.
Sorry I only ran the test on the SRAM. The SD Card will come later. Essentially when the SRAM is done reading out the data, the SD card will take over the bus and transfer it all at once. :)

Lucario448

It turns out to be ~26700 Bytes/second. Only a fourth of the maximum transfer rate.
Now that sounds more realistic; because I've told you'll get a third of the maximum, at best.

Shifting in and out an entire byte takes 16 CPU clock cycles; but the process of obtaining the next byte takes around that or maybe even more, plus the remaining overhead (condition checking in loops, call-returns, timed interrupts every millisecond, and any additional code in between), it's no surprise that the actual performance gets way lower than the theoretical one.



I do have somewhat long wires which may interfere with the transfer rate.
I don't think so.

Since there's not even error detection mechanisms in SPI, long wires (specially unshielded and not twisted) may cause data corruption but not data rate degradation (unless somehow the slave misses clock pulses).


Essentially when the SRAM is done reading out the data, the SD card will take over the bus and transfer it all at once. :)
Now I get it, like an enlarged buffer that shouldn't be dumped (or "flushed") too often to the SD card, due to the reduced performance of a double transfer.


PD: the double transfer can be faster if you put the SRAM readouts directly into the SD's cache, but it's easier said than done because that cache is also required for some filesystem stuff.

raygun3000

Quote
PD: the double transfer can be faster if you put the SRAM readouts directly into the SD's cache, but it's easier said than done because that cache is also required for some filesystem stuff.
I'll keep that in mind. :)

I wanted to refer back to throughput and how I'd account for this. I'm not sure if its something that's easily ideally calculated. So far, I have something like:

                                                                       SPI clock speed (Hz)
 Readings per second =   -------------------------------------------------------------------------------
                                     ( 8 * (1 byte {instruction} + 2 byte {address} + X byte {data bytes})

Note that this doesn't account for overhead.

Lucario448

Note that this doesn't account for overhead.
Without it, for a single byte, it's a quarter of the already discussed maximum; and (in average) the more bytes are required at once, it gets closer to that top. In real life, I think you've figured out what's the actual maximum (top).

raygun3000

Quote
It turns out to be ~26700 Bytes/second. Only a fourth of the maximum transfer rate.
Quote
considering that ~888,888 bits/second is the max data transfer rate due to Arduino Uno capability.
Just an update...turns out that if I move out the line of code that sets the mode of the SRAM to sequential before the micros() call, I achieve 92833 Bytes/second!! That is a drastic change.. The thing is...how can I test with larger amounts of data? At the moment, I'm only testing 1K Byte since I could only make a global buffer variable of that size due to the limited internal RAM.

I've been trying to get a transfer rate on the microSD card on its own and I could only go up to ~3500Bytes/second. I guess this is good enough considering that I'm using the SD Library. I just wanted to confirm if this is a viable way of storing 1KB in file. Originally, I tried to concatenate the bytes into a string but I ran into a memory issue causing the file to not open so I figured I would do everything after opening the file.

------------------------------------------------------------------------
file = SD.open("test5.csv", O_CREAT | O_APPEND | O_WRITE);

while(count < 5){

for (byte i = 0; i < 200; i++) {
 file.print(i);
 file.println(",");
}
count++;
}
------------------------------------------------------------------------

Go Up