8MB Ramdisk (external RAM) for Arduino..

I have no immediate use for this but love the idea. Good work.


Rob

When you need to set the Address to 0 (for example when always writing/reading a buffer from ramdisk's zero Address) you may use this shortcut:

// reset the Address to 0x000000L
inline void resetadr (){
	NDATA = 1;  // address mode	
	NRD = 0;  NRD = 1;
	NDATA = 0;  // data mode
}

Moreover, you can stop writing the Address nibbles after any nibble written :P, thus you may write in only "required" number of Address nibbles (1 or 2 or 3 or 4 or 5 or 6).

That allows to speed up with setting the Address, for example you work with 128 blocks 65kB each, so you must not set all 6 nibbles of the 8MB Address, but only top 2:

// Load the BLOCK Address into the RAMDISK
// a BLOCK is a 65kB large chunk of ram starting at 0x0000
inline void ldadr_block65k (unsigned char addr)
{
	NDATA = 1;  // address mode	
	// the trick with zeroing the 24bit address
	NRD = 0;  NRD = 1;
	// now load the 2 nibbles of the BLOCK address
	DDRA = 0xFF; // sets Data output
		PORTA = (addr >> 4); // 6th nibble of the address - the highest
		NWR = 0; NWR = 1;    // 
		PORTA = (addr); 	// 5th nibble of the address
		NWR = 0; NWR = 1;
	// 4th, 3rd, 2nd, 1st nibble are zero - here we start to read/write
	DDRA = 0x00; // sets Data input
	NDATA = 0;  // data mode
}

With a standard external HW SRAM buses on DUE or PIC32 or others, you may simply write the 24bit Address with A0=1 (when for example you use A0 address line) and then you read/write all the Data bytes from A0=0. You must not mess with bit-banging the /RD and /WR signals, or with the setting of a 8bit Data port direction - that is done by the MCU's external memory bus for you (EMB on DUE, PMP on PIC32, FSMC on STM32, etc.). You have to set the timing of the /RD and /WR signals as required, however.

For example - a pseudo-code:

//write 24bit Address, A0=1
bus_write(1, 6th_nibble); //the highest
bus_write(1, 5th_nibble);
bus_write(1, 4th_nibble);
bus_write(1, 3rd_nibble);
bus_write(1, 2nd_nibble);
bus_write(1, 1st_nibble);
// write 100000 bytes of Data (w/ auto-increment)
for i=1 to 100000 
bus_write(0, data[i]);
next i

or

// read 100000 bytes of Data (w/ auto-increment)
for i=1 to 100000
data[i] = bus_read(0);
next i

When working with blocks (see above):

//write 8bit BLOCK Address (up to 128 BLOCKS each BLOCK 65kB large)
dummy = bus_read(1);  //resets Address to zero (0L)
bus_write(1, 6th_nibble); //the highest
bus_write(1, 5th_nibble);
// write 65kBytes of Data (w/ auto-increment)
for i=1 to 65536 
bus_write(0, data[i]);
next i

or

// read 65kBytes of Data (w/ auto-increment)
for i=1 to 65536 
data[i] = bus_read(0);
next i

You may connect the 8MB Ramdisk to your DUE via DUE's External Memory Bus easily:

With Ax = 1 you write in the 24bit starting "Address" 
With Ax = 0 you write/read the Data bytes sequentially from the "Address"
You may use any DUE address line.

DUE and 8MB RAMDISK.jpg

Hello Pito,

thx for this great work!

I have ordered two of those 8MB Ram devices... This is exactly what I need! :slight_smile:

Kind Regards,

Andreas

With DUE this is an Example how to proceed with 8MB Ramdisk on the DUE's External Memory Bus (just fragments of a code here, needs to be put into your code):

// DUE timing settings and Examples for 8MB Ramdisk
// Pito 4/2014
// Examples only, not tested
// Library used: https://github.com/delsauce/ArduinoDueParallel
// DUE at 84MHz (12ns clock)
// No warranties of any kind

// Wiring:
RDisk	DUE (EMB signal names)
================
D0-D7	D0-D7
/WR	NWE
/RD	NRD
/DATA	A0

// Configure parallel bus for 8bits, no CS, A0, and NRD and NWE
Parallel.begin(PARALLEL_BUS_WIDTH_8, 0, 1, 1, 1);

// Configure bus timings.. EXPERIMENTAL (assumption: WR pulse could be shorter than RD pulse)
// Otherwise WR=RD timing
// We do not use any real addressing
Parallel.setAddressSetupTiming(0,0,0,0);
// NWE, NCSWE, NRD, NCSRD  - we do not use NCSs
Parallel.setPulseTiming(5,0,8,0); // or (4,0,7,0) or (6,0,9,0)
// better read datasheet for CycleTiming:
Parallel.setCycleTiming(6,9);  // or (5,8) or (7,10)


// Setting of the starting Address (8MB max)
set_address(uint32 Address){
	// below sequence could be shorter when setting only some upper nibbles
	MAKE NIBBLES OF Address SOMEHOW - ie. with union as above or with >>
	Parallel.write(0x01, nibb6);  // addr[23..20]
	Parallel.write(0x01, nibb5);
	Parallel.write(0x01, nibb4);
	Parallel.write(0x01, nibb3);
	Parallel.write(0x01, nibb2);
	Parallel.write(0x01, nibb1);  // addr[3..0]
	//Parallel.write(0x01, CONTROL);  // 0x00 default, 0x01-autodecrement, 0x02-wrprotect, 0x03-both
}

// Write 1MB of data=0xAA from Address=0x2FFFFF
set_address(0x2FFFFF);
for(uint32 i=0; i<1MB; i++){
	Parallel.write(0x00, 0xAA);
}

// Read and sum 1MB of data from Address=0x2FFFFF
uint32 sum = 0;
set_address(0x2FFFFF);
for(uint32 i=0; i<1MB; i++){
	sum = sum + Parallel.read(0x00);
}

// Fast reset of the Address to 0x000000
reset_address(){
	Parallel.read(0x01);
}

// Example: addressing of a 65kB large Block
// Max 128 Blocks of 65KB each with 8MB Ramdisk, a pity
set_block_address(uint8 BlockNumber){
	Parallel.write(0x01, BlockNumber>>4);  // addr[23..20]
	Parallel.write(0x01, BlockNumber);  // addr[19..16]
}

// set the Address at the beginning of the Block N.6
reset_address();  // Address is 0x000000 now
set_block_address(0x06); // set 2 highest nibbles only - the Block number
// Write 65kB of data=0xAA into the block N.6
for(uint32 i=0; i<65kB; i++){
	Parallel.write(0x00, 0xAA);
}

// set the Address at the beginning of the Block N.6
reset_address();  // Address is 0x000000 now
set_block_address(0x06); // set 2 highest nibbles only - the Block number
// read and sum 65kB worth of data from the block N.6
uint32 sum = 0;
for(uint32 i=0; i<65kB; i++){
	sum = sum + Parallel.read(0x00);
}

In case you can change the EMB timing parameters on-the-fly with Due (most probable) you can speed up the setting of the Address significantly. The Address is written into the CPLD controller, thus it must not comply with the psram timings.

A 15-20ns long /WR pulse works fine while writing the Address.

// Setting of the starting Address (8MB max)
set_address(uint32 Address){
	// below sequence could be shorter when setting up only some of the nibbles
	MAKE NIBBLES OF Address SOMEHOW - ie. with union as above or with  >>
        SET WRITE PULSE TO 15-20ns WIDTH
	Parallel.write(0x01, nibb6);  // addr[23..20]
	Parallel.write(0x01, nibb5);
	Parallel.write(0x01, nibb4);
	Parallel.write(0x01, nibb3);
	Parallel.write(0x01, nibb2);
	Parallel.write(0x01, nibb1);  // addr[3..0]
	//Parallel.write(0x01, CONTROL);
        SET WRITE PULSE BACK
}

pito, very impressive work you've done!

Your hardware setup looks quite similar to a mini-board I've been reviewing. Its only 1"x2", has 32 MByte SDRAM, 8 Mbit Flash, 600,000-gate FPGA, its completely open source and costs $69. What I like is that the FPGA communicates with the SDRAM as if its standard RAM. They explain how the code can be modified for dual-port, tri-port or quad port memory access. The Flash card could be buffered at such a high level that even the slowest cards shouldn't cause any hiccups.

They have excellent documentation and training manuals published.
I've been "dreaming" about a mini-card like this with a SAM3X on it.

I've been "dreaming" about a mini-card like this with a SAM3X on it.

There's a current thread about one being made, I can't find it now though.

EDIT: Found it, Small-footprint Due - Arduino Due - Arduino Forum


Rob

Thanks Graynomad - yes, nice board ... all the pieces are coming together, but the one that still eludes me (for now) is "time".

Join the club, too many projects and too few years :slight_smile:


Rob

This would be great for very fast data loggers.
Cheers!

These are simplified schematics of a datalogger and a pattern/signal generator talented hackers may try out.. :wink:

1. Waveform/pattern generator
i. you write the required pattern into the ramdisk (EN=0) in a standard way
ii. you read the pattern out of the ramdisk with /RD used as the "clock" (EN=1)
iii. you may use an external clock generator, or you may generate the clock from the mcu (ie. via PRG_OSC pin)
iv. the address rolls over to zero after reaching 8MB, so the pattern will repeat itself
v. mind the 74273 (or an another type) is a "register", not a "latch"
vi. you may generate 10million patterns per second max.

2. Datalogger
i. you write the starting address (ADCEN=0) in a standard way
ii. by "clocking" the /WR signal you may write the ADC data (or "any" data) into the ramdisk (ADCEN=1)
iii. you have to organize the ADC conversion based on the ADC chip you use (CONV rising edge writes ADC to the latch)
iv. you may read the data in a standard way out of the ramdisk
v. you may sample 15million samples per second max.

  1. Below you may see how to connect two 8MB Ramdisk modules - 8bit and 16bit data bus versions.

PS: With 16bit data bus version you may enhance the datalogger/pattern/waveform examples to 16bit width easily..
PS1: The minimal number of data bits used is 4 (D0-D3), ie. you want to generate 4bit patterns only
PS2: Provided as-is, no warranties of any kind :slight_smile:

Andreas,
the wiring - read this topic first: Parallel library for Due External Memory Bus/Static Memory Controller - Arduino Due - Arduino Forum

I recommend to connect the ramdisk to the DUE's External Memory Bus (and to use the existing driver, ie.
https://github.com/delsauce/ArduinoDueParallel ). Later on you may access the EMB registers directly to avoid the C++ overheads (in order to speed up the transfers).

See this link for wiring details (you have to find the proper header pins for the specific EMB function).
http://forum.arduino.cc/index.php?topic=152644.msg1146753#msg1146753

Ramdisk		DUE EMB (Function)
-------------------
D0-D7		D0-D7
/RD		NRD
/WR		NWE
/DATA		A0
/MS             GND
3.3V		3.3V
GND		GND

PS:
D0-D7 does not mean digital pins D0-D7 on DUE, but the data signals D0-D7 on the DUE's EMB bus.
A0 means Address line A0 on the DUE's EMB bus.
NRD, NWE means NRD and NWE on DUE's EMB bus.
See: 8MB Ramdisk (external RAM) for Arduino.. - #15 by pito - Other Hardware Development - Arduino Forum - the picture says "EMB signals D0-D7"

PS1:
Do not use the bitbanging example for arduino - that is just an example for people with 3.3V 8bitters.
You may use the code snippets for DUE there in my above posts - mind that code is not a sketch which will compile, it is an example how to use the DUE's EMB library functions (see above topics) with the ramdisk, you have to add all necessary stuff around (based on your actual requirements).

PS2:
Do it step by step. First learn how to use the DUE's EMB with the ramdisk. Write a small sketch where you write/read data into/from the ramdisk, do experiments with the timing settings.. (I can assist you in a separate topic or offline).

PS3: (!!!)
Pls mind the wire length is important while chasing the nanoseconds :slight_smile:
I've been running 2 modules in parallel on the same solderless breadboard with ~5cm long data lines and 10-15cm long NRD,NWE,A0,A1 (fubarino @120MHz) - see the first picture of this topic.
Use as short as possible wires when connecting a memory device.

PS4:
This is an example code which runs fine on DUE (UPDATED 22.5.2014):

// A DUE EMB Example for 8MB Ramdisk v.1.1
// Library used: https://github.com/delsauce/ArduinoDueParallel
// DUE at 84MHz (12ns clock)
// IDE 1.5.6r2
// Wiring: 11cm long jump wires and solderless breadboard
// Provided as-is, no warranties of any kind
// Pito 5/2014

/* Wiring:
==============================================
RDisk   DUE's header    DUE's EMB signal name
==============================================
D0-D7   PIN34-PIN41     D0-D7
/WR     PIN45           NWE
/RD     PWM4            NRD
/DATA   PWM9            A0 (A0=0 wr/rd Data, A0=1 wr Addresses)
/MS     GND             GND
3.3V    3.3V
GND     GND
==============================================

DUE's EMB signal name    MCU I/O line
======================================
D0                       PC2
D1                       PC3
D2                       PC4
D3                       PC5
D4                       PC6
D5                       PC7
D6                       PC8
D7                       PC9

NWE                      PC18
NRD                      PA29
A0                       PC21
======================================
*/

#include <Parallel.h>

typedef union {
	unsigned long value;
	struct {
		unsigned char nib1: 4;  // lowest nibble
		unsigned char nib2: 4;
		unsigned char nib3: 4;
		unsigned char nib4: 4;
		unsigned char nib5: 4;
		unsigned char nib6: 4;
		unsigned char nib7: 4;
		unsigned char nib8: 4;  // highest nibble
	};
} nibbles;

// Setting up the rd/wr starting Address (8388607 max)
int set_address(unsigned long address){
	nibbles temp;
	temp.value = address;
	Parallel.write(1, temp.nib6);  // addr[23..20]
	Parallel.write(1, temp.nib5);
	Parallel.write(1, temp.nib4);
	Parallel.write(1, temp.nib3);
	Parallel.write(1, temp.nib2);
	Parallel.write(1, temp.nib1);  // addr[3..0]
}

void setup() {
  
	// Configure parallel bus for 8bits, no CS, A0, and NRD and NWE
	Parallel.begin(PARALLEL_BUS_WIDTH_8, PARALLEL_CS_1, 1, 1, 1);

	// Configure bus timings.. EXPERIMENTAL
	Parallel.setAddressSetupTiming(1,1,1,1);

	// NWE, NCSWE, NRD, NCSRD  - we do not use NCSs
	Parallel.setPulseTiming(7,1,7,1);

	// CycleTiming = PulseTiming + AddressSetupTiming + 1
	Parallel.setCycleTiming(9,9);

	// make a dummy read
	Parallel.read(0);

	// set the Serial
	Serial.begin(115200);
}

void loop() {

	unsigned long i, elapsedw, elapsedr;

	Serial.println("START OF THE TEST");

	Serial.println("WRITING BYTES TO RAMDISK");
	// Write 1 million data = 234 from Address=4444444
	delay(500);
	elapsedw = millis();
	set_address(4444444L);
	for(i=0; i<1000000; i++){
		Parallel.write(0, 234);
	}
	elapsedw = millis() - elapsedw;

	Serial.println("READING BYTES FROM RAMDISK");
	// Read and sum 1 million of data from Address=4444444
	delay(500);
	elapsedr = millis();
	unsigned long sum = 0;
	set_address(4444444L);
	for(i=0; i<1000000; i++){
		sum = sum + Parallel.read(0);
	}
	elapsedr = millis() - elapsedr;

	Serial.print("SUM = ");
	Serial.println(sum);
	Serial.print("ELAPSED WRITE = ");
	Serial.print(elapsedw);
	Serial.println(" msec");
	Serial.print("ELAPSED READ = ");
	Serial.print(elapsedr);
	Serial.println(" msec");
	Serial.println("TEST STOP");

	while(1);
}
START OF THE TEST
WRITING BYTES TO RAMDISK
READING BYTES FROM RAMDISK
SUM = 234000000
ELAPSED WRITE = 680 msec
ELAPSED READ = 846 msec
TEST STOP

There is an undocumented feature with the Parallel driver - the PARALLEL_CS_NONE parameter sets NRD/NWE timing to 12ns regardless of the timing settings. As a workaround use PARALLEL_CS_1, even we do not use any CS.
Above sketch updated.

Pito, the updated sketch works perfectly! :smiley:

Thx so much for your patience and support! :slight_smile: :slight_smile: :slight_smile:

Andreas

Great!
I've been messing a little bit with the Parallel driver, as it seemed to me the driver is slow.
I did following:
Comment out in Parallel.cpp:

//__attribute__((optimize("O0"))) void ParallelClass::write(uint32_t offset, uint8_t data)
//{
//	*((volatile uint8_t *)(_addr + (offset&0x00FFFFFF))) = data;
//}
 
//__attribute__((optimize("O0"))) uint8_t ParallelClass::read(uint32_t offset)
// {
//	return *((volatile uint8_t *)(_addr + (offset&0x00FFFFFF)));
// }

And replace in Parallel.h :

void write(uint32_t offset, uint8_t data) ;
uint8_t read(uint32_t offset);

with:

void write(uint32_t offset, uint8_t data)
{
	*((volatile uint8_t *)(_addr+offset)) = data;
}

uint8_t read(uint32_t offset)
{
	return *((volatile uint8_t *)(_addr+offset));
}

Then I get following elapsed time for the above example sketch:

START OF THE TEST
WRITING BYTES TO RAMDISK
READING BYTES FROM RAMDISK
SUM = 234000000
ELAPSED WRITE = 321 msec
ELAPSED READ = 262 msec
TEST STOP

There is still an issue as the NWE timing does not react to timing parameter changes properly. The NRD timing works fine, it seems.

I've added a delay(500) before the "for loop" measurements (to eliminate the potential influence of print) and the result of above example sketch is:

START OF THE TEST
WRITING BYTES TO RAMDISK
READING BYTES FROM RAMDISK
SUM = 234000000
ELAPSED WRITE = 226 msec
ELAPSED READ = 262 msec
TEST STOP

That means one write cycle (inclusive "for loop") takes 226ns (or 4.4MBytes/sec).

Moreover, I did a measurement of the actual NWE signal High/Low durations during the 1mil "for loop" vs. the NWE timing parameters settings (the resolution of my LA is 5ns):

A, P, C		L		H		L+H		MBytes/sec
=====================================================================
1, 2, 4		25		205		230		4.35
1, 4, 6		45		180		225		4.44
1, 8,10		95		135		230		4.35
1,16,18		185		40		225		4.44
1,32,34		380		25		405		2.47

where
A - NWE setAddressSetupTiming
P - NWE setPulseTiming
C - NWE setCycleTiming

L[ns] - NWE Low - active low pulse (write pulse)
H[ns] - NWE High - the overhead of the for..loop

How to decipher that results?? Any hint?
:~

PS: the "NWE Low" copies the setPulseTiming well (NWE Low = 12ns * P), but why the L+H is constant or "framed" or "limited"??

We found out that for experimenting with DUE's EMB bus timings it is better to fill in the memory with random numbers. Below find the sketch.

// DUE EMB Example for 8MB Ramdisk v.1.1
// 1M BYTES RANDOM WRITE/READ
// Library used: https://github.com/delsauce/ArduinoDueParallel
// DUE at 84MHz (12ns clock)
// Provided as-is, no warranties of any kind
// Pito 7/2014

/* Wiring:
 ==============================================
 RDisk   DUE's header    DUE's EMB signal name
 ==============================================
 D0-D7   PIN34-PIN41     D0-D7
 /WR     PIN45           NWE
 /RD     PWM4            NRD
 /DATA   PWM9            A0 (A0=0 wr/rd data, A0=1 wr addresses)
 /MS     GND             GND
 ==============================================
 */

#include <Parallel.h>

// Setting up the rd/wr starting Address (8388607 max)
int set_address(unsigned long address){
	Parallel.write(1, address >> 20);  // addr[23..20]
	Parallel.write(1, address >> 16);
	Parallel.write(1, address >> 12);
	Parallel.write(1, address >> 8);
	Parallel.write(1, address >> 4);
	Parallel.write(1, address);  // addr[3..0]
}

void setup() {

	// Configure parallel bus for 8bits, no CS, A0, and NRD and NWE
	Parallel.begin(PARALLEL_BUS_WIDTH_8, PARALLEL_CS_1, 1, 1, 1);

	// Configure bus timings.. EXPERIMENTAL
	// We do not use any real addressing
	Parallel.setAddressSetupTiming(1,1,1,1);

	// NWE, NCSWE, NRD, NCSRD  - we do not use NCSs
	Parallel.setPulseTiming(5,1,7,1);

	// better read datasheet for CycleTiming:
	Parallel.setCycleTiming(7,9);

	// make a dummy read
	Parallel.read(0);

	// set the Serial
	Serial.begin(115200);
}

void loop() {

	unsigned i;
	unsigned sumr, sumw, elapsedw, elapsedr;
	unsigned char data;

	randomSeed(micros());

	Serial.println("START OF THE TEST");
	Serial.println("WRITING 1Mil RANDOM BYTES TO RAMDISK");

	// Write 1 million random data
	sumw = 0;
	elapsedw = millis();
	set_address(0L);
	for(i=0; i<1000000; i++){
		data = random(256);
		Parallel.write(0, data);
		sumw = sumw + data;
	}
	elapsedw = millis() - elapsedw;

	Serial.println("READING AND SUM 1Mil RANDOM BYTES FROM RAMDISK");
	// Read and sum 1million of data from Address=0L
	sumr = 0;
	elapsedr = millis();
	set_address(0L);
	for(i=0; i<1000000; i++){
		sumr = sumr + Parallel.read(0);
	}
	elapsedr = millis() - elapsedr;

	Serial.print("SUM_W = ");
	Serial.println(sumw);
	Serial.print("SUM_R = ");
	Serial.println(sumr);
	Serial.print("ELAPSED WRITE = ");
	Serial.print(elapsedw);
	Serial.println(" msec");
	Serial.print("ELAPSED READ = ");
	Serial.print(elapsedr);
	Serial.println(" msec");
	Serial.println("TEST STOP");
	Serial.println(" ");

	//while(1);
}

I've got a similar one (but 16Mb serial). Might be of interest to someone. Products | Dimitech - look for DTX2-5055C

sorry for necroposting but I was hoping to buy one of these but they seems out of production... so what do you think about this + this?

basically is a retail sdr ram TSOP-54 chip mounted on a DIP-54 adapter for breadboard. the speed it's not a problem because according to the datasheet which on page 20 states

  1. For operating frequencies ≤ 45 MHz, tCKS = 3.0ns

seems we can get quite slow...
refresh is a problem neither because this module also feature autorefresh. They are also 3.3V ±0.3V!!!

mounting can be optionally made by the adapter retailer so even n00bs like me could afford prototyping. cost seems reasonable too... If this solution is working we can access infinite RAM on our DUE module, useful for example for big image manipulation (e.g. 10MP camera sensor)! in this case I'm talking about 512MB (64Mb) in 64MX8 configuration.
To quench the thirst of pin of this beast we could use shift registers or multiplexers for data and address

I've seen ram modules even in commercial cheap-o stuff EEVblog #595 - World's Worst Shittiest Camcorder: Teardown - YouTube @19:37 in the upper part. and here the same module mounted over a ram bank http://old-pc-museum.narod.ru/olderfiles/1/64Mb_eliteMt_m12l64164a-8t_big.jpg.

The sdrams are quite cheap, but not easy to interface. If the DUE supports an sdram interface then it would be doable, otherwise, you have to write a bitbanging driver for it. We did it once with pic32mx and the result was moderate.

The 8MB ramdisk uses an PSRAM - it is an sdram with pure sram interface. Therefore you work with static sram design while utilising large sdram volume.

The static srams are expensive - 8MB of static sram would cost you a lot of money.

Also, when talking micro-controllers, you want to save the i/o pins for something else. An 8MB sram needs about 34 pins in an 8bit data width configuration.

So, you have to balance all possible requirements..