extra RAM on Arduino Due

Hi all,

I want to store short wave files in RAM after having read them on an SD card, because I think SD card is slowing me down in my sketch.
For now, I have an arduino Due and an SD card wired on the SPI bus.
I want to add at least 1 MByte of RAM, and the best would 8 MByte.

I know that it's nearly impossible to add external RAM to arduino Due, due to the wiring of the pins. So I think there are only 2 solutions left :
Series RAM on SPI, and I²C RAM.
According to your experience, what is the best, with better speed? Is there no problem if I put several SPI peripherals (SD card + SRAM)?
The biggest module I found was 1 Mbit for SPI, and 512kb for I²C. It's not enough for me. Do you know if there are module with more memory and usable by arduino?
SPI : http://fr.farnell.com/microchip/23lc1024-i-p/sram-serie-1mbit-2-5v-8pdip/dp/2212152

If it doesn't exist, I thought about flash memory. There are modules with 8 Mbits.

Is the flash memory very slower than RAM?
Would it be faster than my SD card? (if no, I would have no interest to use one)

Thanks,

Gaétan

Well you can have several of those chips on the SPI bus to make up the total.
The problem with SD card on the same bus is that I don't think you can have a file open and do any other SPI transaction, but I could be wrong on this.

The other option is to use paged RAM, that is where you read and write on a ports worth of I/O pins and use another bunch of I/O pins or a binary counter as the address lines. Then you can use parallel SRAM.

because I think SD card is slowing me down in my sketch.

Showing that sketch, and explaining WHY you think that the SD access is the culprit might be entertaining. Perhaps even useful.

What are you doing with the wav file data?

Hi,

@Grumpy_mike

Thanks for the idea.

What you are telling about using parallel SRAM, is (for exemple) to link 20 pins from the arduino to the SRAM chip to use 1 M adresses, and connect 8 other pins to get 8 bit data?
So it would do 1 MByte data?

Did you already did that? I found someone who did this with a ttiny chip, but all was written in assembly code...

Is this this kind of chip ?

@ PaulS

Here is a piece of code from my blog, for a monophonic sampler with explainations :

For polyphonic samples read, I'm doing something like this (just an extract) :

#include <samplerl.h>

const int samplernumber = 4;
Samplerl samp[samplernumber];
const char* samplefile[]= {"kick1.wav", "hithat1.wav", "snare1.wav", "snare2.wav"};


void setup() { 
  for(int i=0; i<samplernumber; i++)
  {
    samp[i].init();
    samp[i].load(samplefile[i]);
  }

void loop() {
  for(int i=0; i<samplernumber; i++)
  {
    if(samp[i].buffill()) i= 100;
  }
} 


void loop44kHz() { 
  int32_t ulOutput=2048;
  for(int i=0; i<samplernumber; i++)
  {    
    samp[i].next();
    ulOutput += samp[i].output();
  }
  if(ulOutput>4095) ulOutput=4095;
  if(ulOutput<0) ulOutput=0;
  dacc_write_conversion_data(DACC_INTERFACE, ulOutput); 
}

The library I've written is here, my problem is in the the sample.h file :

you can see this post too :
http://forum.arduino.cc/index.php?topic=167778.0

Here is the concept :

. I have a buffer. There are 2 occurences : One that is being played, and one that is loaded from the SD card.
The best performances I obtained was with a 512 Bytes buffer.

. I have a timer running at 44 kHz. Here I play the wave buffer.

. In the main loop, I load the occurence of the buffer that has been played

So when I play 4 samples at a time, my code is doing something like this :

loop() :
occurence 1 : load buffer1 sample 1
occurence 2 : load buffer1 sample 2
occurence 3 : load buffer1 sample 3
occurence 4 : load buffer1 sample 4
occurence 5 : load buffer2 sample 1
occurence 6 : load buffer2 sample 2
occurence 7 : load buffer2 sample 3
occurence 8 : load buffer2 sample 4

The time to read 512 bytes in SD card is around 1 ms.
So time elapse between the loading of buffer1 and 2 of sample 1 is around 4 ms.

With 44kHz 16 bits (2 bytes) audio, it takes 512/2/44000 seconds = 5,8 ms to read one buffer

So as soon as I add a fifth sample, I can't read enough fast the SD card to fill the buffers.
I use SdFatLib, and maybe there are problems with reading several files at a time.

So if I can find a memory faster than SPI SD card, I think I can read more samples at a time.

Did you already did that?

Not on an arduino but I have done it on other processors.
You don't need all 20 address lines either for audio work. You want to access a lot of consecutive memory locations so your least significant n bits ( say 8 or 10 ) can be replaced with a counter, so that you set the most significant bits and simply clock the counter between accesses. If you wire up the clock and reset for a counter that reduces the number of bits you need for the address.

but all was written in assembly code

There is no need for that using C is fine.

Ok thanks.
So if I'm considering the renesas chip which has 19 adress pins and 16 word pins (512k * 16 bits), let's call them Ad0-Ad18 and W0-W15 :
I connect Ad0-Ad10 to D0-D10 of my arduino. I will read or write at least 512 Bytes each time (size of my buffer), corresponding to 256*16 bits, so I can keep Ad11-Ad18 for the timer.
I connect W0-W15 to D11-D27 of the arduino.
I connect D28 to the clock of sram.
I connect D29 to send write/read instruction to sram.

Then I can send an 11 bits adress with the arduino, and send or receive informations composed of 8 successive words of 16 bits.

But what I don't understand is where and how do I connect Ad11-Ad18. Do I have to add an hardware timer ? Or does the arduino can do that ? (If connect to the arduino, I won't save pins, so it's not usefull)

Other question : do you know a library which can deal with this, or will I have to make one ?

Thanks again

Gaétan

Have you got a specific chip in mind? If so can you provide a link?

I connect D28 to the clock of sram.

Most SDRAM does not have a clock but it does have a chip enable that you don't mention.

so I can keep Ad11-Ad18 for the timer.

What timer, do you mean the counter I talked about, they are not the same thing.

Yes, sorry. The chip I had in mind is this one :

You're right. According to the specs, there is no clock. Only this :
CS# Chip select
WE# Write enable
OE# Output enable
LB# Lower byte enable
UB# Upper byte enable

I think I will have to use WE (when writing) and OE (when reading).

Indeed, I wanted to talk about the counter, and not timer.
How can I generate this counter to send it to the sram chip ?

This is what I was thinking.

extra sram.png

Thanks, nothing is better than a schematic to understand :smiley:

If I only count 8 bits, do you think that this binary counter is ok :
http://fr.farnell.com/texas-instruments/sn74hc590an/ic-counter-binary/dp/1470786

If yes, I will order ram and counter and keep you informed if it works. I might have some questions when I'll write the library to use it...

Yes that looks ok although I was thinking more like a 74LS93 or maybe a synchronous counter, you can cascade two of them for extra bits and have any size "page" you like.
Good luck

You might try the modern SdFat library Google Code Archive - Long-term storage for Google Code Project Hosting.. The official Arduino SD library is based on a very old version of SdFat and is very slow on Due.

SdFat uses fast DMA on Due and large reads that are a multiple of 512 bytes are not copied. Try allocating eight 8,192 byte buffers, two for each file. Open all four files before starting your reads.

Here is an example benchmark for Due write/read with 8,192 byte buffers:

Free RAM: 87327
Type is FAT32
File size 10MB
Buffer size 8192 bytes
Starting write test. Please wait up to a minute
Write 2688.79 KB/sec
Maximum latency: 92689 usec, Minimum Latency: 2302 usec, Avg Latency: 3042 usec

Starting read test. Please wait up to a minute
Read 3852.83 KB/sec
Maximum latency: 4656 usec, Minimum Latency: 2087 usec, Avg Latency: 2124 usec

SD cards perform much better with large multi-block reads.

thoughput oon SPI is great but the latency is killing me...
I'm generating ~30-40megBytes / min & having the occasional long latency really mucks up my data

I've purchaced a fpga+ram board(ebay) to make a 32Megabyte spi buffer
-- but my verilog/VHDL coding skills are 10years rusty ...

Also looking at the same fpga SPI buffer to exchange data between the DUE & Raspberry-PI (two masters...) with no droped datas

Hi,

@ fat16lib
I use the last version of SdFatLib (it was 1 month ago maybe, I don't know if it changed since then).
I open the wave files in the beginning of my sketch, then only read them to fill buffers, and use seek(0) to go back to the beginning.

I tried several sizes of buffer, and the more efficient I found was 512 samples (so 1024 bytes, because the wave is 16 bits).

8192 Bytes is really too much. For now I can play 4 samples at a time, but if I want to play 5 or more, it will be too big.
And when I'm doing a groovebox, I have to store waveform tables (4 kB), waveshaping effect tables (8 kB), etc...
With the 96 kB RAM of Due, it's not enough.

But I will experiment again with bigger buffers around 5-6 kBytes, and try to understand why it's not working better than 1k buffer.
Like you say, it should be faster. Maybe I have an other problem than just the reading time.

Thanks for the reply.

@ralphnev
Yes, I'm afraid of the latency too. With real-time audio, I can't have any latency with the RAM.
So the Grumpy_Mike solution seems good, even if it's harder to do.
I'll receive components in a few days, and I'll post here my experiments. Because if you're interested, I think you can put a large amount of RAM with parallel SRAM, and all the pins of the Due.

Buffers sizes that are a power of two likely will work best. The cluster size for files is a power of two and reading across a cluster boundary will defeat the larger buffer size. Try 4 KB buffers.

Free RAM: 91423
Type is FAT32
File size 10MB
Buffer size 4096 bytes

Starting read test. Please wait up to a minute
Read 2848.53 KB/sec
Maximum latency: 3510 usec, Minimum Latency: 1339 usec, Avg Latency: 1436 usec

Make sure you SD is formatted correctly with 32 KB clusters. SD formatter SD Memory Card Formatter for Windows/Mac | SD Association places file system structures for best performance.

I built an external sram device with XC9536XL ($1) and a 4MBytes large SRAM chip. You need 11 wires to control the memory (8data+3bit control). With bigger cpld (more pins) you can access an "unlimited" size of sram. It has got an auto increment feature thus a rd/wr to the device automatically increments the sram's address.

The r/w speed with pic32@80MHz (bitbanging, no dma) and with larger blocks (ie. 512bytes) is ~6.5Mbytes/sec. It can be used in "16bit mode" (16data+3bit control wires required) with double speed.

Actually, the data width is not related to the cpld, so you may run it in 32b or 64b data width mode (4x or 8x faster) but you need more data wires, unfortunately..

More on:

pito:
I built an external sram device with XC9536XL ($1) and a 4MBytes large SRAM chip. You need 11 wires to control the memory (8data+3bit control). With bigger cpld (more pins) you can access an "unlimited" size of sram. It has got an auto increment feature thus a rd/wr to the device automatically increments the sram's address.
[snip]

i looked at a similar method but it requires to much processor intervention

  • using SPI/USART with PDC/DMA should get up to 4MBytes/sec with very little processor intervention ...

my chunks are 4096bytes occuring once every 5ms (and faster if i can make other improvements)
so low processor over head is very important to me ..

Hi,
I don't understand all you're talking about, but it confirms that it's possible to add parallel SRAM, with high performances.

Pito, could you explain me something I don't understand (I already think about the library I'll have to write) :
You're telling in your post that by doubling the data size, it doubles the data transfer speed. But does the Arduino can read for example 32 pin-in state in the same time ? If I put 32 times the instruction DigitalRead(x) to read the DATA, I think it must take more CPU cycles than reading 8 inputs, and then slow the RAM access frequency ? Or is there a special function on Arduino which can read all the 32 input bits in the same time ?

Sorry for my questions, I'm totally noob in low-level micro controllers.

@gaith: some mcus can read/write a 16bit port with single instruction (ie. they have 16bit ports) or maybe 32bit ports as well. You have to investigate. I do not have DUE handy, but I would guess 32bit ARM or 32bit AVR can read/write 16/32bit data from/to its ports..

If I put 32 times the instruction DigitalRead(x)

With 8bit arduino you can read/write 8bit port in a single instruction, reading data with DigitalRead() is of course something I would never ever consider, indeed. Again - any mcu I know can read/write 8bit wide data from a port with single instruction.

This writes a byte to the port B:

  DDRB = 0xFF; //sets port B to output
  PORTB = addr;

This reads a byte from the port D:

  DDRD = 0x00; //sets port D to input
  data = PIND;

More reading: Arduino Reference - Arduino Reference

The pic32mx I referenced above has 16bit ports, so you can r/w the 8b or 16b port with single instruction. For a 32bit mcu it basically does not matter whether you read/write 8/16/32 bit - it is always (or mostly) a single instruction, because they always work with 32bit data internally..
Example:
Reading 8bits from the above disk device:

loop {
set /RD low
int8 data[i] = (PORTA & 0x00FF)
set /RD high
}

Reading 16bits from the above disk device:

loop {
set /RD low
int16 data[i] = PORTA
set /RD high
}

Writing 8bits to the above disk device:

loop {
set /WR low
PORTA = (int8 data[i]  & 0x00FF)
set /WR high
}

Writing 16bits to the above disk device:

loop {
set /WR low
PORTA = (int16 data[i] )
set /WR high
}

Great !
So I have to find how to access to the port registers on Due.
It seems that someone already tried to adapt an arduino Mega library which can do this, but after reading the posts... I don't know if it works or not.
http://forum.arduino.cc/index.php?PHPSESSID=5mukpmk3fcgd6712quj4fnop57&topic=129868.0

Maybe it can be easier if I use directly some ARM assembly code.

I'll keep on investigate.

Thanks a lot for your explanations.