Parallel library for Due External Memory Bus/Static Memory Controller

Thanks :slight_smile: If you add the addresses to the library I'll check it with a SRAM.

I managed to attach to the pin for port C 27 without soldering 8) You need an IC pigtail clip like this:

http://www.coolcomponents.co.uk/catalog/hook-with-pigtail-p-805.html

Straighten the ends a little with sharp-nose pliers and cut a little off the plastic sheath, and cover the outsides of the ends with an etch-resist pen to insulate from the adjacent pins. Use this sketch to help you get the right pin.

// Address A6 / Port C 27 test by stimmer

// Port C 27 is above the right edge of the SPI connector
// it is the 7th pin from the bottom right end of the SAM3X

// Output on port C 27 is high impedance for
// 1 second, followed by 5 short HIGH/LOW pulses

// Output on Port C 26 (to the left of C27) is HIGH whilst C27 is Hi-Z
// and Hi-Z whilst C27 is blinking

// Output on Port C 28 (to the right of C27) is HIGH for 0.5 sec then 
// LOW for 0.5 sec whilst C27 is Hi-Z, and Hi-Z whilst C27 is blinking

// Using this you can tell if you have the right pin, and if you are
// accidentally touching one of the adjacent pins.

void setup() {                
}

void loop() {
  pinMode(3,OUTPUT);     
  pinMode(4,OUTPUT);

  PIOC->PIO_PER = 1<<27;  
  PIOC->PIO_ODR = 1<<27;
  digitalWrite(3, HIGH);  
  digitalWrite(4, HIGH);   
  delay(500);             
  digitalWrite(3, LOW);  
  delay(500);             
  digitalWrite(4, LOW);   
  pinMode(3,INPUT);     
  pinMode(4,INPUT);
  PIOC->PIO_OER = 1<<27;
  for(int i=0;i<5;i++){
    PIOC->PIO_SODR = 1<<27;
    delay(100);
    PIOC->PIO_CODR = 1<<27;
    delay(100);
  }

}

A9 is the north side of the RX led and can be attached with an unmodified pigtail clip.

stimmer:
Thanks :slight_smile: If you add the addresses to the library I'll check it with a SRAM.

New code posted up on github. Now you can have all 16 data pins and 23 address pins if you wish. Note that you can't have A5 and NRD since the pins are wired together. You'll need to operate without NRD if using A5 or cut the trace on the bottom of the board that ties the two pins together.

Brilliant work - after setting some conservative timings it worked first time :grin:

I am using a 128KByte SRAM (AS6C1008) so only used the first 17 address lines. I didn't use NRD, I just tied the OE pin low (a write cycle still works with OE low - OE is active low)

#include <Parallel.h>

void setup() {

  Parallel.begin(PARALLEL_BUS_WIDTH_8, PARALLEL_CS_0, 17, false, true);
  Parallel.setCycleTiming(16,16);
  Parallel.setPulseTiming(4,4,4,4);
  Parallel.setAddressSetupTiming(4,4,4,4);
  
  Serial.begin(115200);
}

void loop() {

  int t=micros();
  
  randomSeed(t);
  
  Serial.print("WRITE seed="); Serial.print(t);
  for(int a=0;a<131072;a++)  Parallel.write(a,random(256));
  Serial.println(" done");

  randomSeed(t);
  Serial.print("READ ");
  for(int a=0;a<131072;a++){
    int d=Parallel.read(a);
    int r=random(256);
    if(d!=r){
      Serial.println();
      Serial.print("Error at address ");
      Serial.print(a,HEX);
    }
  }  
  Serial.println("done"); 
}
WRITE - seed=679868002 done
READ done
WRITE - seed=680429926 done
READ done
WRITE - seed=680991852 done
READ done
WRITE - seed=681553778 done
READ done
WRITE - seed=682115712 done
READ done
WRITE - seed=682677638 done
READ done

I could probably get the timings down lower but given the spaghetti on my breadboard perhaps that's not such a good idea 8)

update: got the timings down to

  Parallel.setCycleTiming(5,5);
  Parallel.setPulseTiming(4,4,4,4);
  Parallel.setAddressSetupTiming(1,1,1,1);

Given that it's a 55ns part I can't go any lower.

One last request: a getAddr() method :slight_smile:

Really nice stuff.

So for the sake of breaking out 2 pins this could have been an easy add on. Is that the case?


Rob

It would have been easier, yes, although I'd like to have had the other 2 data bus pins too (if I had to choose between the two I'd pick the data bus pins)

I'm not sure how worthwhile a memory expansion would be commercially, given that there's already quite a lot of ram in the Due, and there's always the Raspberry Pi for applications needing huge of memory. But it's an interesting enough project if you've already got an old SRAM chip :wink:

stimmer:
One last request: a getAddr() method :slight_smile:

This is so that you can access the memory mapped peripheral directly without incurring the overhead of read/write? If so, that seems like a reasonable request. I realize that the current code isn't as efficient as it could be because I opted for simplicity. Perhaps there is a better way I could have implemented it that would achieve both. I'll noodle on that.

In the meantime, you can grab the new code with getAddress() on github

It's more so I can use memset/memcpy/memmove and test if my circuit is reliable at full speed.

You can make the read and write methods faster by moving the code inside the class definition in the .h file, then the compiler will automatically inline them, removing the overhead.

Hello, Could somebody help me with a parallel data problem ?
I'm building a robot with two quadrature encoders for feed back control with an Arduino DUE. I need to read 8 bit parallel data values coming from the two Quadrature Decoder/ Counter Interface IC HCTL 2022 (from one of the two at a time of course). I've chosen the Arduino DUE's processor's pins PC12 to PC19 for my data bits D0 to D7. I can manage for selecting the register I want to read in one of the two ICs but could you give me the few lines of code needed for setting up my 8 bits data bus in the setup() and in the loop() for reading the values with the fastest method and form a byte from the reading. I don't need to send values on the 8 bit data bus,I only need to read incoming values.
Thanks a lot for helping because I don't understand the code given in the previous posts.

Random thoughts about CPU speed and external bus speed. It will be very difficult to get an external memory to work at CPU speed 80MHz. It may help if every memory read or write takes several clock cycles, but is it a RISC cpu after that. Really fast CPUs use several tricks to get relatively slow external memory and high CPU clock frequency to match. But I understand those are not possible with this chip. 80MHz clock needs faster than 12ns =1/80MHz memory.

When I read the posts more carefully I noticed that there is a speed setting for external memories. Even slower. Perhaps the external memory is more useful with some data roms or IO.

About IO. Slower IO operations are usually ok, because there are less of those than memory operations. And relatively slow IO is often not a problem if I think about external devices.

By the way, if address and data had been multiplexed it would take less pins. 16 address/datapins muxed gives 65000 16bit IO ports. Enough?

To fablagrenouille:
I think quadrature encoders are not so fast you need this kind of buses. Register/port reading and writing is easy and forums are full of instructions of how it is done.

Here is the result of my memcpy test:

#include <Parallel.h>

void setup() {
  Parallel.begin(PARALLEL_BUS_WIDTH_8, PARALLEL_CS_0, 17, false, true);
  Parallel.setCycleTiming(4,4);
  Parallel.setPulseTiming(3,3,4,3);
  Parallel.setAddressSetupTiming(0,0,0,0);
  Serial.begin(115200);
}

uint8_t inb[8192],outb[8192];

void loop() {

  int r=micros();
  memcpy(inb,outb,8192);
  int s=micros();
  
  for(int a=0;a<8192;a++){inb[a]=random(256);outb[a]=random(256);}
  uint8_t *m=(uint8_t *)Parallel.getAddress();
  
  int t=micros();
  memcpy(m,inb,8192);
  int u=micros();
  memcpy(m+32768,m,8192);
  int v=micros();
  memcpy(outb,m+32768,8192);
  int w=micros();

  for(int a=0;a<8192;a++)if(inb[a]!=outb[a]){ Serial.println("Error ");break;}
  
  Serial.println("memcpy speed in MiB/s");
  Serial.print("int to int "); Serial.println(7812.5/(s-r)); // 7812.5==8192*1000000/1048576
  Serial.print("int to ext "); Serial.println(7812.5/(u-t));
  Serial.print("ext to ext "); Serial.println(7812.5/(v-u));
  Serial.print("ext to int "); Serial.println(7812.5/(w-v));
  Serial.println();
  
  delay(1000);
}
memcpy speed in MiB/s
int to int 76.59
int to ext 19.88
ext to ext 8.63
ext to int 13.40

That was the fastest timings I could get from the 55ns SRAM chip without errors - the 19.88MiB/s write speed is slightly out of spec 8)

Thanks LMI for your answer.
Through the 8bit bus I have to read four bytes in each IC HCTL2022 and I don't want my program to loose to much time for this.
:roll_eyes:

Hello guys.
Long time reader first time poster.

Im designing my own board for another purpose using the same ATSAM3X chip, and after reading this thread I am still a little confused as to how to setup External ram as a continuation of the internal ram - presumable this is possible.

What I mean by this, assuming that all the control, data and address lines are connected correctly to an external chip, how does the MCU know to point the stack to this external memory once all of the internal 48Kb internal ram is used ? is it automatic after setting some SMC control bit in some register?

If this is not possible, then it requires some memory management in code which is quite annoying. Any help would be appreciated.

Why do you want the stack to be in external RAM?

once all of the internal 48Kb internal ram is used ?

This will never all be used by the stack unless you have a massively-recursive function (very unlikely), and if data is that large put it in external.

So far I don't understand what you want to do.


Rob

DanShephertan:
Hello guys.
Long time reader first time poster.

Im designing my own board for another purpose using the same ATSAM3X chip, and after reading this thread I am still a little confused as to how to setup External ram as a continuation of the internal ram - presumable this is possible.

What I mean by this, assuming that all the control, data and address lines are connected correctly to an external chip, how does the MCU know to point the stack to this external memory once all of the internal 48Kb internal ram is used ? is it automatic after setting some SMC control bit in some register?

If this is not possible, then it requires some memory management in code which is quite annoying. Any help would be appreciated.

Fair question. The external memory appears to the MCU as a block of memory at an address depending on its Chip Select line. (These pins are labelled NCS0-NCS7 on the SAM3X). Typically you would connect CSx to the CS pin on your memory device. The lowest 24 bits of a memory address are decoded by the external memory device, the upper 8 bits are decoded in the memory controller. The locations of the memory sections are shown in Figure 8-1 of the SAM3X User Manual. For example, CS0 maps to 0x60000000.

The second question is how to map the application code to memory regions. This can be done very simply by direct access and setup from the application code, but this method is less flexible. You can allocate dynamic variables at runtime just by setting their address to a hardcoded value correspodning to the external memory. If you want to get automatic placement of variables, then you would need to use the linker.

By changing the linker file, you can specify the additional memory region, and what should go into it. At runtime, RAM variables will be zeroed or initialised with data copied from Flash as necessary. This is part of the Standard C initialisation procedure. With GCC toolchain you may need to edit a startup file to include the additional memory regions.

Note that you will need to edit the startup code to setup the memory controller before letting the code does its memory copying/zeroing. If you put the stack into external memory, you must make sure your startup code does not use it before it is setup! It is probably best to set a temporary stack in internal RAM for the startup code, and then change stack after the external memory is setup. It is simpler though to keep stack in internal memory, and use the external memory for other stuff.

The third question is how to get the application to use the extended memory. This can be done in the linker file, using combination of "pragma section" and the linker file, or explicitly in the application, depending on taste.

If this all seems quite complicated, then you are probably right. It needs good knowledge of the MCU and the toolchain to set the right things at the right point. Once setup, it becomes fairly seamless, but get one tiny thing wrong and it falls over in a heap.

To keep it simple, I would probably use direct setup from the application, and use the external memory as a pool of dynamic buffers for bulk data.

This has been very helpful thanks. I will not be this ambitious and stick to directly pushing objects into external ram instead.

I have one other related question. Can the due be programmed directly from USB or does it need the JTAG? I understand it cannot use the ISP like the Mega, but I really dislike those large JTAG connectors and I dont really have the real estate either. Suggestsions?

AFAIK there are 4 options for programming a SAM3X

USB bootloader (built in)
UART bootloader (built in and what the Due uses)
JTAG (10-way header but .05" spacing so pretty small)
SWD (4-way header, normal .1" spacing)

So you can load programs directly from USB but right now I can't find a description of exactly how you do that.


Rob

memcpy speed in MiB/s
int to int 76.59
int to ext 19.88
ext to ext 8.63
ext to int 13.40

@stimmer
From what I see in the data sheet it takes 6 cycles (Figure 27-7) to perform an X memory access, 84MHz / 6 = a transfer rate of 14MB/s and yet you get nearly 20.

Can you explain how this happens? Maybe there are savings in setup and hold times for a block move or something.

EDIT: OK I see in a later diagrams accesses down to 3 cycles with 0 setup and hold times, that would be 28MB/s so I guess anywhere between 14 and 28MB/s is fair game depending on various factors.


Rob

Hello world. :slight_smile:
I have a question about ext/SRAM. Can I connect two SRAM ic's (8bit) to external memory bus?
Use a full 16bit data bus and 17bit address bus for two ic's. I have two CY7C1019DV33 (128kx8bit) http://www.cypress.com/?docID=31943.
I know that some pins is not connected, I can connect them directly from sam3x ic. But how about NRD pin? Is it possible to do without it? (connect OE to ground as @stimmer did it)

PS: Sorry if my English is bad.

interesting project & will experiment with it in the next few days

I have in mind a large fifo 32Meg'ish (fpga tied to a sdram)
using no addresses - just reads & writes (need the PWM lines for other aspects of my programs)
accounting can be taken care of by the Due
It would need a NWE/NRD/NBSx line

This is all to be rid of SDcard latency - i have too much data throughput to deal with 100-300ms delay
hmmmmm...

Hi,

i am using this parallel library to drive a LCD display. I am using one address line to toggle the RS line of the display.

Unfortunately i do not get the first five address lines to work. The first address line that is working as expected is line A5. A0-A4 stay constantly high.

So my display works fine if i use line A5 (and 5-times faster than using ports), thanks for the work!

But i would like to use A0 (pin 9) instead of A5. Any idea how i can get A0 working? Any trick how to enable A0/C.21/PWM9?