Parallel library for Due External Memory Bus/Static Memory Controller

This has been very helpful thanks. I will not be this ambitious and stick to directly pushing objects into external ram instead.

I have one other related question. Can the due be programmed directly from USB or does it need the JTAG? I understand it cannot use the ISP like the Mega, but I really dislike those large JTAG connectors and I dont really have the real estate either. Suggestsions?

AFAIK there are 4 options for programming a SAM3X

USB bootloader (built in)
UART bootloader (built in and what the Due uses)
JTAG (10-way header but .05" spacing so pretty small)
SWD (4-way header, normal .1" spacing)

So you can load programs directly from USB but right now I can't find a description of exactly how you do that.


Rob

memcpy speed in MiB/s
int to int 76.59
int to ext 19.88
ext to ext 8.63
ext to int 13.40

@stimmer
From what I see in the data sheet it takes 6 cycles (Figure 27-7) to perform an X memory access, 84MHz / 6 = a transfer rate of 14MB/s and yet you get nearly 20.

Can you explain how this happens? Maybe there are savings in setup and hold times for a block move or something.

EDIT: OK I see in a later diagrams accesses down to 3 cycles with 0 setup and hold times, that would be 28MB/s so I guess anywhere between 14 and 28MB/s is fair game depending on various factors.


Rob

Hello world. :slight_smile:
I have a question about ext/SRAM. Can I connect two SRAM ic's (8bit) to external memory bus?
Use a full 16bit data bus and 17bit address bus for two ic's. I have two CY7C1019DV33 (128kx8bit) http://www.cypress.com/?docID=31943.
I know that some pins is not connected, I can connect them directly from sam3x ic. But how about NRD pin? Is it possible to do without it? (connect OE to ground as @stimmer did it)

PS: Sorry if my English is bad.

interesting project & will experiment with it in the next few days

I have in mind a large fifo 32Meg'ish (fpga tied to a sdram)
using no addresses - just reads & writes (need the PWM lines for other aspects of my programs)
accounting can be taken care of by the Due
It would need a NWE/NRD/NBSx line

This is all to be rid of SDcard latency - i have too much data throughput to deal with 100-300ms delay
hmmmmm...

Hi,

i am using this parallel library to drive a LCD display. I am using one address line to toggle the RS line of the display.

Unfortunately i do not get the first five address lines to work. The first address line that is working as expected is line A5. A0-A4 stay constantly high.

So my display works fine if i use line A5 (and 5-times faster than using ports), thanks for the work!

But i would like to use A0 (pin 9) instead of A5. Any idea how i can get A0 working? Any trick how to enable A0/C.21/PWM9?

Did you look at the example provided in the library (S1D13700_LCD)? It is using one address bit to communicate with an external parallel LCD... I haven't tested that in a while, but it was working when I posted the code. If not, then perhaps something has changed in the Arduino config since then?

cnkz:
Hi,

i am using this parallel library to drive a LCD display. I am using one address line to toggle the RS line of the display.

Unfortunately i do not get the first five address lines to work. The first address line that is working as expected is line A5. A0-A4 stay constantly high.

So my display works fine if i use line A5 (and 5-times faster than using ports), thanks for the work!

But i would like to use A0 (pin 9) instead of A5. Any idea how i can get A0 working? Any trick how to enable A0/C.21/PWM9?

Yes, i used the LCD example as the foundation of my code. With that example code A0 wasn't working also. Seems something has changed...

About unusable pins. You can still use external memory if some address or datapins are not available. If address pin is missing, you just can't use the whole memory space. With missing data pins it is similar.

So we'll have like a 256k 14bit ram or something available, instead of a full 1M 16bit device. But ofcourse cpu with a proper external bus would be nice.

If address pins are missing you will have holes in the space that would cause duplications on top of other data.

So for example if A4 was missing you could write 16 contiguous bytes ok, but the 17th byte will go into location 0, thus overwriting the first byte. This could be manageable but a right PITA.

Likewise with data, the top bits may not matter if you stick to values below the first missing bit, but if you are missing any low-order bits you will be in trouble. This is almost not possible to use unless you "adjust" every variable you save and "unadjust" every variable you retrieve.


Rob

I just bought a DigiX (a Due clone, see https://www.kickstarter.com/projects/digistump/digix-the-ultimate-arduino-compatible-board-with-w and http://digistump.com/wiki/digix ). The DigiX has the four missing address and data pins (among others) on another row of connectors to the right of the board. Unfortunately, for compatibility with the Due the NRD signal is still wired to A5 so is unusable - although as I pointed out before you don't usually need it.

I wired up my 128Kx8bit ram up to the DigiX, and can confirm that the library works correctly with no modification needed, and no address gaps :slight_smile: I haven't tried a 16 bit data bus yet as I haven't got a 16 bit ram, but I can't think of any reason why it wouldn't work.

How can there not be a gap and duplication with A5 missing? Or is it me that's missing (something :))

If you write 0-255 sequentially into the first 255 locations do you read 0-255 back?


Rob

I worded it confusingly - what I meant was NRD is unusable, not A5. With them both wired to the same Due pin it's either one or the other. But NRD is not needed for a RAM.

My test program writes all 128K sequentially with a pseudorandom sequence then reads it all back checking every value - no errors :slight_smile:

Yeah ok, that makes sense.


Rob

Graynomad:
If address pins are missing you will have holes in the space that would cause duplications on top of other data.

So for example if A4 was missing you could write 16 contiguous bytes ok, but the 17th byte will go into location 0, thus overwriting the first byte. This could be manageable but a right PITA.

Likewise with data, the top bits may not matter if you stick to values below the first missing bit, but if you are missing any low-order bits you will be in trouble. This is almost not possible to use unless you "adjust" every variable you save and "unadjust" every variable you retrieve.


Rob

I don't think missing address pins cause other problems than a smaller device. For instance a 65000 byte device with 16 address bits. If leave one address pin/bit out, you'll have a device with 32000 usable bytes. If it is a some kind of rom and you connect it to a processor with all address pins, then you'll get gaps (or a great mess, if don't take care). I am not going to draw any truth tables so I leave this here.

You may connect the 8MB ramdisk to your DUE via the External Memory Bus (Static 8bit memory) - 8MB Ramdisk (external RAM) for Arduino.. - Other Hardware Development - Arduino Forum

RDisk	DUE (EMB signal names)
================
D0-D7	D0-D7
NWR	NWE
NRD	NRD
NDATA	Ax

With Ax = 1 you write the 24bit starting address of a block
With Ax = 0 you write/read the bytes sequentially from the address

There is an undocumented feature in the driver with
PARALLEL_CS_NONE parameter
It causes 12ns timing for NRD/NWE regardless any other timing settings (verified with LA).

Also I do not understand the elapsed time results for the test sketch, where I write/read 1 mil bytes with the same elapsed time for quite different EMB timings:

	// Configure parallel bus for 8bits, no CS, A0, and NRD and NWE
	Parallel.begin(PARALLEL_BUS_WIDTH_8, PARALLEL_CS_1, 1, 1, 1);
	// Configure bus timings.. EXPERIMENTAL
	Parallel.setAddressSetupTiming(1,1,1,1);
	// NWE, NCSWE, NRD, NCSRD  - we do not use NCSs
	Parallel.setPulseTiming(4,1,7,1);
	Parallel.setCycleTiming(6,9);
	
START OF THE TEST
WRITING BYTES TO RAMDISK
READING BYTES FROM RAMDISK
SUM = 234000000
ELAPSED WRITE = 680 msec
ELAPSED READ = 846 msec
TEST STOP

	// Configure parallel bus for 8bits, no CS, A0, and NRD and NWE
	Parallel.begin(PARALLEL_BUS_WIDTH_8, PARALLEL_CS_1, 1, 1, 1);
	// Configure bus timings.. EXPERIMENTAL
	Parallel.setAddressSetupTiming(1,1,1,1);
	// NWE, NCSWE, NRD, NCSRD  - we do not use NCSs
	Parallel.setPulseTiming(7,1,7,1);
	Parallel.setCycleTiming(9,9);

START OF THE TEST
WRITING BYTES TO RAMDISK
READING BYTES FROM RAMDISK
SUM = 234000000
ELAPSED WRITE = 680 msec
ELAPSED READ = 846 msec
TEST STOP

with the sketch from 8MB Ramdisk (external RAM) for Arduino.. - #24 by pito - Other Hardware Development - Arduino Forum
(you may run the sketch without the ramdisk attached).

PS: with 12ns+48ns+12ns=72ns write cycle I would expect faster write than above 680ns per byte. Does it mean the driver (inclusive the "for loop") creates a 610ns overhead??

Another issue I cope is the NWE timing does not react to NWE pulse setting properly.

Parallel.setPulseTiming(7,1,7,1);

For example doubling the setting (with adjusting the CycleTiming accordingly) does not change the total elapsed time for write. It seems the NRD setting works.

I did a measurement of the actual NWE signal High/Low durations during the 1mil write "for loop" vs. the NWE timing parameters settings (the resolution of my LA is 5ns):

A, P, C		L		H		L+H		MBytes/sec
========================================================================
1, 2, 4		25		205		230		4.35
1, 4, 6		45		180		225		4.44
1, 8,10		95		135		230		4.35
1,16,18		185		40		225		4.44
1,32,34		380		25		405		2.47

where
A - NWE setAddressSetupTiming
P - NWE setPulseTiming
C - NWE setCycleTiming
L[ns] - NWE active low pulse (write pulse)
H[ns] - overhead of the for..loop

How to decipher that results?? Any hint?
Why the L+H is constant for P=2 or 4 or 8 or 16?

Hello,

I read your Parallel library this afternoon and think it is very close to what I need. I have an Arduino Due board and a PGA69-CM1K co-processor. I am using this chip to do machine learning work. I have connected the Due board to the CM1K and can perform my analysis. However, I am currently using I2C to send the data. This approach is very slow and it affects my performance. So, what I need to do now is to send the data in Parallel (16 bits preferably although 8 bits would do for now). Could you tell me if you think this is possible with the library that you have. I noticed that some aspects appear to be very specific to the external memory device you are using. Any suggestions and advice on this would be greatly appreciated.

The spec of the chip I am using is here:http://www.cognimem.com/_docs/Technical-Manuals/TM_CM1K_PGA69_Hardware_Manual.pdf

Also, I tried compiling the library but it gave me an error that it could not find the "sam.h" file.

Thanks
Ricardo