extra RAM on Arduino Due

Yes that looks ok although I was thinking more like a 74LS93 or maybe a synchronous counter, you can cascade two of them for extra bits and have any size "page" you like.
Good luck

You might try the modern SdFat library Google Code Archive - Long-term storage for Google Code Project Hosting.. The official Arduino SD library is based on a very old version of SdFat and is very slow on Due.

SdFat uses fast DMA on Due and large reads that are a multiple of 512 bytes are not copied. Try allocating eight 8,192 byte buffers, two for each file. Open all four files before starting your reads.

Here is an example benchmark for Due write/read with 8,192 byte buffers:

Free RAM: 87327
Type is FAT32
File size 10MB
Buffer size 8192 bytes
Starting write test. Please wait up to a minute
Write 2688.79 KB/sec
Maximum latency: 92689 usec, Minimum Latency: 2302 usec, Avg Latency: 3042 usec

Starting read test. Please wait up to a minute
Read 3852.83 KB/sec
Maximum latency: 4656 usec, Minimum Latency: 2087 usec, Avg Latency: 2124 usec

SD cards perform much better with large multi-block reads.

thoughput oon SPI is great but the latency is killing me...
I'm generating ~30-40megBytes / min & having the occasional long latency really mucks up my data

I've purchaced a fpga+ram board(ebay) to make a 32Megabyte spi buffer
-- but my verilog/VHDL coding skills are 10years rusty ...

Also looking at the same fpga SPI buffer to exchange data between the DUE & Raspberry-PI (two masters...) with no droped datas

Hi,

@ fat16lib
I use the last version of SdFatLib (it was 1 month ago maybe, I don't know if it changed since then).
I open the wave files in the beginning of my sketch, then only read them to fill buffers, and use seek(0) to go back to the beginning.

I tried several sizes of buffer, and the more efficient I found was 512 samples (so 1024 bytes, because the wave is 16 bits).

8192 Bytes is really too much. For now I can play 4 samples at a time, but if I want to play 5 or more, it will be too big.
And when I'm doing a groovebox, I have to store waveform tables (4 kB), waveshaping effect tables (8 kB), etc...
With the 96 kB RAM of Due, it's not enough.

But I will experiment again with bigger buffers around 5-6 kBytes, and try to understand why it's not working better than 1k buffer.
Like you say, it should be faster. Maybe I have an other problem than just the reading time.

Thanks for the reply.

@ralphnev
Yes, I'm afraid of the latency too. With real-time audio, I can't have any latency with the RAM.
So the Grumpy_Mike solution seems good, even if it's harder to do.
I'll receive components in a few days, and I'll post here my experiments. Because if you're interested, I think you can put a large amount of RAM with parallel SRAM, and all the pins of the Due.

Buffers sizes that are a power of two likely will work best. The cluster size for files is a power of two and reading across a cluster boundary will defeat the larger buffer size. Try 4 KB buffers.

Free RAM: 91423
Type is FAT32
File size 10MB
Buffer size 4096 bytes

Starting read test. Please wait up to a minute
Read 2848.53 KB/sec
Maximum latency: 3510 usec, Minimum Latency: 1339 usec, Avg Latency: 1436 usec

Make sure you SD is formatted correctly with 32 KB clusters. SD formatter SD Memory Card Formatter for Windows/Mac | SD Association places file system structures for best performance.

I built an external sram device with XC9536XL ($1) and a 4MBytes large SRAM chip. You need 11 wires to control the memory (8data+3bit control). With bigger cpld (more pins) you can access an "unlimited" size of sram. It has got an auto increment feature thus a rd/wr to the device automatically increments the sram's address.

The r/w speed with pic32@80MHz (bitbanging, no dma) and with larger blocks (ie. 512bytes) is ~6.5Mbytes/sec. It can be used in "16bit mode" (16data+3bit control wires required) with double speed.

Actually, the data width is not related to the cpld, so you may run it in 32b or 64b data width mode (4x or 8x faster) but you need more data wires, unfortunately..

More on:

pito:
I built an external sram device with XC9536XL ($1) and a 4MBytes large SRAM chip. You need 11 wires to control the memory (8data+3bit control). With bigger cpld (more pins) you can access an "unlimited" size of sram. It has got an auto increment feature thus a rd/wr to the device automatically increments the sram's address.
[snip]

i looked at a similar method but it requires to much processor intervention

  • using SPI/USART with PDC/DMA should get up to 4MBytes/sec with very little processor intervention ...

my chunks are 4096bytes occuring once every 5ms (and faster if i can make other improvements)
so low processor over head is very important to me ..

Hi,
I don't understand all you're talking about, but it confirms that it's possible to add parallel SRAM, with high performances.

Pito, could you explain me something I don't understand (I already think about the library I'll have to write) :
You're telling in your post that by doubling the data size, it doubles the data transfer speed. But does the Arduino can read for example 32 pin-in state in the same time ? If I put 32 times the instruction DigitalRead(x) to read the DATA, I think it must take more CPU cycles than reading 8 inputs, and then slow the RAM access frequency ? Or is there a special function on Arduino which can read all the 32 input bits in the same time ?

Sorry for my questions, I'm totally noob in low-level micro controllers.

@gaith: some mcus can read/write a 16bit port with single instruction (ie. they have 16bit ports) or maybe 32bit ports as well. You have to investigate. I do not have DUE handy, but I would guess 32bit ARM or 32bit AVR can read/write 16/32bit data from/to its ports..

If I put 32 times the instruction DigitalRead(x)

With 8bit arduino you can read/write 8bit port in a single instruction, reading data with DigitalRead() is of course something I would never ever consider, indeed. Again - any mcu I know can read/write 8bit wide data from a port with single instruction.

This writes a byte to the port B:

  DDRB = 0xFF; //sets port B to output
  PORTB = addr;

This reads a byte from the port D:

  DDRD = 0x00; //sets port D to input
  data = PIND;

More reading: Arduino Reference - Arduino Reference

The pic32mx I referenced above has 16bit ports, so you can r/w the 8b or 16b port with single instruction. For a 32bit mcu it basically does not matter whether you read/write 8/16/32 bit - it is always (or mostly) a single instruction, because they always work with 32bit data internally..
Example:
Reading 8bits from the above disk device:

loop {
set /RD low
int8 data[i] = (PORTA & 0x00FF)
set /RD high
}

Reading 16bits from the above disk device:

loop {
set /RD low
int16 data[i] = PORTA
set /RD high
}

Writing 8bits to the above disk device:

loop {
set /WR low
PORTA = (int8 data[i]  & 0x00FF)
set /WR high
}

Writing 16bits to the above disk device:

loop {
set /WR low
PORTA = (int16 data[i] )
set /WR high
}

Great !
So I have to find how to access to the port registers on Due.
It seems that someone already tried to adapt an arduino Mega library which can do this, but after reading the posts... I don't know if it works or not.
http://forum.arduino.cc/index.php?PHPSESSID=5mukpmk3fcgd6712quj4fnop57&topic=129868.0

Maybe it can be easier if I use directly some ARM assembly code.

I'll keep on investigate.

Thanks a lot for your explanations.

I understand you going to add RAM but I am curious why reading from the SD fails.

I looked at the Groovuino library and was astounded that only one file handle was uses and files are opened and closed while playing sound.

Opening a file is very slow so I would have used an array of file handles and opened all the files before playing sound. A file handle only requires about 32 bytes. Rewinding or seeking to the start of a file requires no SD access for an open file.

Yes, that's what I'm doing, and it works, I can play 4 files in the same time, but I want more !

In fact I have 2 classes : sampler.h and samplerl.h
The sampler.h open the file each time it's played. It is for using different samples for each pattern. So I'm unlimited by the number of wave files to use, but it's not optimized. I can do only 2 voices polyphony.
The samplerl.h opens one file for each pattern, then uses seek function to go to the beginning. Better time access, but I can only use one wave file by pattern, and reach 4 voices polyphony.

What do you call the "file handle"? It's the SdFile object ?

The SdFile object acts like a file handle in other systems. It contains information from the directory entry and cluster information for the current position. A number of blocks must be read from file structures to open a file and seek to a position.

If I wanted to optimize reads from a large number of files I would use raw SD reads.

When you copy files to freshly formatted SD, the files are contiguous. SdFat has a function to determine if a file is contiguous and where the blocks are located.

bool SdBaseFile::contiguousRange ( uint32_t * bgnBlock,
uint32_t * endBlock
)

Check for contiguous file and return its raw block range.

Parameters:
[out] bgnBlock the first block address for the file.
[out] endBlock the last block address for the file.

Returns:
The value one, true, is returned for success and the value zero, false, is returned for failure. Reasons for failure include file is not contiguous, file has zero length or an I/O error occurred.

I would open each file and find its location with the the above function.

I would then use either the Sd2Card single block read function:

bool 	Sd2Card::readBlock (uint32_t block, uint8_t *dst);

Or the Sd2Card multi-block sequence:

bool 	Sd2Card::readStart (uint32_t blockNumber);  // set start block for a multiple block read sequence.

bool Sd2Card::readData (uint8_t *  	dst);  // Read one data block in a multiple block read sequence.

bool Sd2Card::readStop ();  // End a read multiple blocks sequence.

SD cards do look ahead for multiple block reads so are very efficient in this mode.

Hi Fat16,
Thanks for the tip. It's been 2 days since I try to read the files as you say, but I must do something wrong.

Here is the code I use :

#include <arduino.h>
#include <SdFat.h>

SdFat sd;
Sd2Card *card = sd.card();

const int chipSelect = 10;
const int bufsize = 512;

const char* samplefile[]= {"kick1.wav", "hithat1.wav", "snare1.wav", "snare2.wav"};

uint8_t buf[bufsize];

SdFile myFile;

uint32_t bgnBlock;
uint32_t posBlock;
uint32_t endBlock;

void setup() 
{ 
  Serial.begin(9600);   
  sd.begin(chipSelect, SPI_FULL_SPEED);
  
  myFile.open(samplefile[0], O_READ);
 
  posBlock = bgnBlock;

  //card->readBlock(posBlock,buf);
  card->readStart(posBlock);
  card->readData(buf);

  for(int i=0; i<10; i+=1)
  {
      Serial.print("block : ");
      Serial.println(posBlock);
      //card->readBlock(posBlock,buf);
      card->readData(buf);

      for(int j=0; j<255; j+=1)
      {
	   tes[i] = ((int16_t)buf[1+2*j]<<8) + (int16_t)buf[2*j];
	   Serial.println(tes[j]);
       }
       posBlock+=1;
       
     }
     card->readStop();
}

I've tried with both ReadBlock and ReadStart / Data / Stop
The first block is always ok, but after that, it seems that only the first byte of the buffer is filled... I don't understand nothing at all.
Can you see if I'm doing something wrong in my code ?

Thanks

@ Pito and Grumpy_Mike : I received my RAM and other components. It will be hard to solder the RAM as it's not DIP socket, but I will find a way. I keep you in touch.

It will be hard to solder the RAM as it's not DIP socket,

Sorry I though you knew that.
Look for an adapter board, the ones on ebay are often 10 times cheaper than those on Farnell.

Here is a sketch that will read a file using raw reads.

I tested it with about a five MB file on a 1 GB ATP industrial SD.

The result was a read speed of about 4.5 MB/sec on Due.

blocks: 9765
micros: 1094559
MB/sec: 4.57

#include <SdFat.h>
SdFat sd;
SdFile file;
static const uint8_t SD_CS = SS;
uint32_t bgnBlock;
uint32_t endBlock;
uint8_t buf[512];

void setup() {
  Serial.begin(9600);
  if (!sd.begin(SD_CS) || !file.open("TEST.WAV", O_READ)) {
      Serial.println("begin/open");
      while(1);
  }
  if (!file.contiguousRange(&bgnBlock, &endBlock)) {
    Serial.println("not contiguous");
    while(1);
  }
  // count of blocks in file;
  uint32_t n = (file.fileSize() + 511)/512;
  // read start time
  uint32_t t0 = micros();
  
  // address of first block
  sd.card()->readStart(bgnBlock);

  for (uint32_t i = 0; i < n; i++) {
    if (!sd.card()->readData(buf)) {
      Serial.println("readBlock");
      while(1);
    }
  }
  sd.card()->readStop();
  uint32_t t = micros() - t0;
  Serial.print("blocks: ");
  Serial.println(n);
  Serial.print("micros: ");
  Serial.println(t);
  Serial.print("MB/sec: ");
  Serial.println(512.0*n/t);
}
void loop() {}

Edit: I did some tests with four and eight block reads to get the time to read a chunk of a file. These are using the industrial ATP card so will be faster than some consumer cards.

blocks: 4
micros: 574
MB/sec: 3.57

blocks: 8
micros: 1022
MB/sec: 4.01

So you can read a 2048 byte chunk in 574 usec and a 4096 byte chunk in 1022 usec.

Thanks, it works.
With the myFile.read() function, I could load the wave data in any data type. With the readBlock function I get the wave data in a uint8_t[512] buffer.
What I need is a int16_t[1024] buffer (wave format is 16 bit signed integer, and I need 1024 samples).

I've tried to load a uint8_t[4][512] buffer (calling 4 times the readBlock), then make some computations to load it into the int16_t[1024] buffer, but it takes too much cpu time, and I have to instanciate 2 buffers instead of one. So the performances are lower then myFile.read() function.

My intuition tells me to use pointers, to directly load a uint8_t[2048] buffer, and read it as it was an int16_t[1024], but I didn't find the way to do it.
It's sure it has already been done (in SD wave players for exemple), but I didn't find anything on the subject.

Edit : I found that SdFatLib used the operator "reinterpret_cast", but I don't know if it can be used on a whole array.

Any idea ?

Thanks

Here is a function that will read a file chunk into any type destination.

#include <SdFat.h>
SdFat sd;
SdFile file;
const uint8_t SD_CS = SS;
uint32_t bgnBlock;
uint32_t endBlock;

uint16_t wave[1024];
//--------------------------------------------------------
bool readChunk(void* buf, uint32_t startBlock, uint16_t blockCount) {
  uint8_t* dst = (uint8_t*)buf;
  if (!sd.card()->readStart(startBlock)) return false;
  for (uint16_t i = 0; i < blockCount; i++) {
    if (!sd.card()->readData(dst + i*512L)) return false;
  }
  return sd.card()->readStop();
}
//---------------------------------------------------------
void setup() {
  Serial.begin(9600);
  if (!sd.begin(SD_CS) || !file.open("TEST.WAV", O_READ)) {
      Serial.println("begin/open");
      while(1);
  }
  if (!file.contiguousRange(&bgnBlock, &endBlock)) {
    Serial.println("not contiguous");
    while(1);
  }
  uint16_t n = 4;  
  uint32_t t0 = micros();

  if (!readChunk(wave, bgnBlock, n)) {
    Serial.println("readChunk");
    while(1);
  }
  uint32_t t = micros() - t0;
  Serial.print("blocks: ");
  Serial.println(n);
  Serial.print("micros: ");
  Serial.println(t);
  Serial.print("MB/sec: ");
  Serial.println(512.0*n/t);
}
void loop() {}

Here is timing for reading a 1024 element array of uint16_t.

blocks: 4
micros: 576
MB/sec: 3.56

1 Like

Great great thanks !
Now I can manage more than 6 voices of polyphony. I couldn't even reach the limits. This Due is very surprising !
Your function is really faster than the read() function.

I will update my library with this code.

I don't need RAM anymore, but as I received it, I will make the experiments anyway.

Thanks again to all.