Parallel library for Due External Memory Bus/Static Memory Controller

This library enables the External Memory Bus/Static Memory Controller on the Arduino Due board. It's more of an external memory interface than a true parallel port.

The DUE board pins out the data bus on the extended digital headers (D0-D7:PIN34-PIN41) along with the control signals NCS1 and NWR. Some of the address signals are connected to the PWM pins (A0-A5), but a full address bus is unavailable. There is also conflict between the SS1 pin for SPI, A5, and the NRD signal used for the parallel bus. In short the hardware wasn't designed for use with external parallel memories.

The library does allow connection to some of the lower resolution LCD controllers that used index addressing and can speed up read/write times considerably in some situations.

Code is hosted up on git. https://github.com/delsauce/ArduinoDueParallel

In short the hardware wasn't designed for use with external parallel memories.

Something I spotted early in the peice and will never understand.


Rob

It would have been a nice surprise to see an address/data bus pinned out, but this was the first foray for the Arduino folks into a much more complicated processor. The overall goal of simplicity is critical and I guess this feature wasn't above the bar for the first ARM board. It's certainly a balancing act to get everything in there.

On top of that, this part isn't nearly as configurable as some of Atmel's homegrown 32-bit parts, the UC3 family. In those parts each function can go to sometimes as many as 6 different pins. In the SAM3X, many functions only have a single pin option, maybe two if you're lucky.

Anyhoo, there is still some limited use of the EMB/SMC in the current DUE...

Could you add the other address lines anyway, even though some are missing? And do you know which is connected where and which are missing? I think A9 was wired to an LED so is inconvenient but I don't know about the others. You could possibly alter the read/write methods to shuffle the address to avoid the missing bus lines.

Also it's not completely impossible for a very determined hacker to solder to an unconnected pin if you take care to mask off the other pins close by (and know how to get yourself out of trouble when it goes wrong :) )

I don't think the NRD clash is such a big problem, the timing can be deduced from NWR and CS.

The upcoming version of the VGA library uses DMA to the SMC data bus to generate the colour modes, with the signal timings controlling the pixel rate.

I haven't looked for a while but I thought the only show stopper was A5 or maybe A6, all the other pins are broken out IIRC. So I think a determined hacker could solder onto that pin.

I don't remember A9 on a LED but that maybe, which case ditch the LED.


Rob

Here are most of the SMC pins and how they are connected. I left out the NAND flash stuff and didn't double check for accuracy...

Address Bus:

| Function | Chip Pin | Arduino Pin | | - | - | - | | A0 | PC21 | PWM9 | | A1 | PC22 | PWM8 | | A2 | PC23 | PWM7 | | A3 | PC24 | PWM6 | | A4 | PC25 | PWM5 | | A5 | PC26 | PWM4 | | A6 | PC27 | N/C | | A7 | PC28 | PWM3 | | A8 | PC29 | PWM10 | | A9 | PC30 | RX LED | | A10 | PD0 | PIN 25 | | A11 | PD1 | PIN 26 | | A12 | PD2 | PIN 27 | | A13 | PD3 | PIN 28 | | A14 | PD4 | TXD0 | | A15 | PD5 | RXD0 | | A16 | PD6 | PIN 29 | | A17 | PD7 | PWM11 | | A18 | PA25 | MISO | | A19 | PA26 | MOSI | | A20 | PA27/PA18 | SPCK/SCL1 | | A21 | PD8 | PWM12 | | A22 | PD9 | PIN 30 |

Here is the data bus:

| Function | Chip Pin | Arduino Pin | | - | - | - | | D0 | PC2 | PIN 34 | | D1 | PC3 | PIN 35 | | D2 | PC4 | PIN 36 | | D3 | PC5 | PIN 37 | | D4 | PC6 | PIN 38 | | D5 | PC7 | PIN 39 | | D6 | PC8 | PIN 40 | | D7 | PC9 | PIN 41 | | D8 | PC10 | N/C | | D9 | PC11 | N/C | | D10 | PC12 | PIN 51 | | D11 | PC13 | PIN 50 | | D12 | PC14 | PIN 49 | | D13 | PC15 | PIN 48 | | D14 | PC16 | PIN 47 | | D15 | PC17 | PIN 46 |

And the control signals:

| Function | Chip Pin | Arduino Pin | | - | - | - | | NRD | PA29(also tied to PC26 on PCB, which is A5) | SS1/PWM4 | | NWE | PC18 | PIN 45 | | NCS0 | PA6 | AD4 | | NCS1 | PA7 | PIN 31 | | NCS2 | PB24 | N/C | | NCS3 | PB27 | PWM13 |

There isn't too much you can do with the A5/NRD problem in software since the two pins are wired together on the PCB (I am not sure why those pins are wired together...) If you were careful, you could cut the trace on the bottom side of the board...

I can pretty easily add all the pins to the library and let the user sort out any conflicts.

Thanks :slight_smile: If you add the addresses to the library I’ll check it with a SRAM.

I managed to attach to the pin for port C 27 without soldering 8) You need an IC pigtail clip like this:

http://www.coolcomponents.co.uk/catalog/hook-with-pigtail-p-805.html

Straighten the ends a little with sharp-nose pliers and cut a little off the plastic sheath, and cover the outsides of the ends with an etch-resist pen to insulate from the adjacent pins. Use this sketch to help you get the right pin.

// Address A6 / Port C 27 test by stimmer

// Port C 27 is above the right edge of the SPI connector
// it is the 7th pin from the bottom right end of the SAM3X

// Output on port C 27 is high impedance for
// 1 second, followed by 5 short HIGH/LOW pulses

// Output on Port C 26 (to the left of C27) is HIGH whilst C27 is Hi-Z
// and Hi-Z whilst C27 is blinking

// Output on Port C 28 (to the right of C27) is HIGH for 0.5 sec then 
// LOW for 0.5 sec whilst C27 is Hi-Z, and Hi-Z whilst C27 is blinking

// Using this you can tell if you have the right pin, and if you are
// accidentally touching one of the adjacent pins.

void setup() {                
}

void loop() {
  pinMode(3,OUTPUT);     
  pinMode(4,OUTPUT);

  PIOC->PIO_PER = 1<<27;  
  PIOC->PIO_ODR = 1<<27;
  digitalWrite(3, HIGH);  
  digitalWrite(4, HIGH);   
  delay(500);             
  digitalWrite(3, LOW);  
  delay(500);             
  digitalWrite(4, LOW);   
  pinMode(3,INPUT);     
  pinMode(4,INPUT);
  PIOC->PIO_OER = 1<<27;
  for(int i=0;i<5;i++){
    PIOC->PIO_SODR = 1<<27;
    delay(100);
    PIOC->PIO_CODR = 1<<27;
    delay(100);
  }

}

A9 is the north side of the RX led and can be attached with an unmodified pigtail clip.

stimmer: Thanks :) If you add the addresses to the library I'll check it with a SRAM.

New code posted up on github. Now you can have all 16 data pins and 23 address pins if you wish. Note that you can't have A5 and NRD since the pins are wired together. You'll need to operate without NRD if using A5 or cut the trace on the bottom of the board that ties the two pins together.

https://github.com/delsauce/ArduinoDueParallel.git

Brilliant work - after setting some conservative timings it worked first time :grin:

I am using a 128KByte SRAM (AS6C1008) so only used the first 17 address lines. I didn’t use NRD, I just tied the OE pin low (a write cycle still works with OE low - OE is active low)

#include <Parallel.h>

void setup() {

  Parallel.begin(PARALLEL_BUS_WIDTH_8, PARALLEL_CS_0, 17, false, true);
  Parallel.setCycleTiming(16,16);
  Parallel.setPulseTiming(4,4,4,4);
  Parallel.setAddressSetupTiming(4,4,4,4);
  
  Serial.begin(115200);
}

void loop() {

  int t=micros();
  
  randomSeed(t);
  
  Serial.print("WRITE seed="); Serial.print(t);
  for(int a=0;a<131072;a++)  Parallel.write(a,random(256));
  Serial.println(" done");

  randomSeed(t);
  Serial.print("READ ");
  for(int a=0;a<131072;a++){
    int d=Parallel.read(a);
    int r=random(256);
    if(d!=r){
      Serial.println();
      Serial.print("Error at address ");
      Serial.print(a,HEX);
    }
  }  
  Serial.println("done"); 
}
WRITE - seed=679868002 done
READ done
WRITE - seed=680429926 done
READ done
WRITE - seed=680991852 done
READ done
WRITE - seed=681553778 done
READ done
WRITE - seed=682115712 done
READ done
WRITE - seed=682677638 done
READ done

I could probably get the timings down lower but given the spaghetti on my breadboard perhaps that’s not such a good idea 8)

update: got the timings down to

  Parallel.setCycleTiming(5,5);
  Parallel.setPulseTiming(4,4,4,4);
  Parallel.setAddressSetupTiming(1,1,1,1);

Given that it’s a 55ns part I can’t go any lower.

One last request: a getAddr() method :slight_smile:

Really nice stuff.

So for the sake of breaking out 2 pins this could have been an easy add on. Is that the case?


Rob

It would have been easier, yes, although I'd like to have had the other 2 data bus pins too (if I had to choose between the two I'd pick the data bus pins)

I'm not sure how worthwhile a memory expansion would be commercially, given that there's already quite a lot of ram in the Due, and there's always the Raspberry Pi for applications needing huge of memory. But it's an interesting enough project if you've already got an old SRAM chip ;)

stimmer: One last request: a getAddr() method :)

This is so that you can access the memory mapped peripheral directly without incurring the overhead of read/write? If so, that seems like a reasonable request. I realize that the current code isn't as efficient as it could be because I opted for simplicity. Perhaps there is a better way I could have implemented it that would achieve both. I'll noodle on that.

In the meantime, you can grab the new code with getAddress() on github

It's more so I can use memset/memcpy/memmove and test if my circuit is reliable at full speed.

You can make the read and write methods faster by moving the code inside the class definition in the .h file, then the compiler will automatically inline them, removing the overhead.

Hello, Could somebody help me with a parallel data problem ?
I’m building a robot with two quadrature encoders for feed back control with an Arduino DUE. I need to read 8 bit parallel data values coming from the two Quadrature Decoder/ Counter Interface IC HCTL 2022 (from one of the two at a time of course). I’ve chosen the Arduino DUE’s processor’s pins PC12 to PC19 for my data bits D0 to D7. I can manage for selecting the register I want to read in one of the two ICs but could you give me the few lines of code needed for setting up my 8 bits data bus in the setup() and in the loop() for reading the values with the fastest method and form a byte from the reading. I don’t need to send values on the 8 bit data bus,I only need to read incoming values.
Thanks a lot for helping because I don’t understand the code given in the previous posts.

Random thoughts about CPU speed and external bus speed. It will be very difficult to get an external memory to work at CPU speed 80MHz. It may help if every memory read or write takes several clock cycles, but is it a RISC cpu after that. Really fast CPUs use several tricks to get relatively slow external memory and high CPU clock frequency to match. But I understand those are not possible with this chip. 80MHz clock needs faster than 12ns =1/80MHz memory.

When I read the posts more carefully I noticed that there is a speed setting for external memories. Even slower. Perhaps the external memory is more useful with some data roms or IO.

About IO. Slower IO operations are usually ok, because there are less of those than memory operations. And relatively slow IO is often not a problem if I think about external devices.

By the way, if address and data had been multiplexed it would take less pins. 16 address/datapins muxed gives 65000 16bit IO ports. Enough?

To fablagrenouille: I think quadrature encoders are not so fast you need this kind of buses. Register/port reading and writing is easy and forums are full of instructions of how it is done.

Here is the result of my memcpy test:

#include <Parallel.h>

void setup() {
  Parallel.begin(PARALLEL_BUS_WIDTH_8, PARALLEL_CS_0, 17, false, true);
  Parallel.setCycleTiming(4,4);
  Parallel.setPulseTiming(3,3,4,3);
  Parallel.setAddressSetupTiming(0,0,0,0);
  Serial.begin(115200);
}

uint8_t inb[8192],outb[8192];

void loop() {

  int r=micros();
  memcpy(inb,outb,8192);
  int s=micros();
  
  for(int a=0;a<8192;a++){inb[a]=random(256);outb[a]=random(256);}
  uint8_t *m=(uint8_t *)Parallel.getAddress();
  
  int t=micros();
  memcpy(m,inb,8192);
  int u=micros();
  memcpy(m+32768,m,8192);
  int v=micros();
  memcpy(outb,m+32768,8192);
  int w=micros();

  for(int a=0;a<8192;a++)if(inb[a]!=outb[a]){ Serial.println("Error ");break;}
  
  Serial.println("memcpy speed in MiB/s");
  Serial.print("int to int "); Serial.println(7812.5/(s-r)); // 7812.5==8192*1000000/1048576
  Serial.print("int to ext "); Serial.println(7812.5/(u-t));
  Serial.print("ext to ext "); Serial.println(7812.5/(v-u));
  Serial.print("ext to int "); Serial.println(7812.5/(w-v));
  Serial.println();
  
  delay(1000);
}
memcpy speed in MiB/s
int to int 76.59
int to ext 19.88
ext to ext 8.63
ext to int 13.40

That was the fastest timings I could get from the 55ns SRAM chip without errors - the 19.88MiB/s write speed is slightly out of spec 8)

Thanks LMI for your answer. Through the 8bit bus I have to read four bytes in each IC HCTL2022 and I don't want my program to loose to much time for this. :roll_eyes:

Hello guys. Long time reader first time poster.

Im designing my own board for another purpose using the same ATSAM3X chip, and after reading this thread I am still a little confused as to how to setup External ram as a continuation of the internal ram - presumable this is possible.

What I mean by this, assuming that all the control, data and address lines are connected correctly to an external chip, how does the MCU know to point the stack to this external memory once all of the internal 48Kb internal ram is used ? is it automatic after setting some SMC control bit in some register?

If this is not possible, then it requires some memory management in code which is quite annoying. Any help would be appreciated.

Why do you want the stack to be in external RAM?

once all of the internal 48Kb internal ram is used ?

This will never all be used by the stack unless you have a massively-recursive function (very unlikely), and if data is that large put it in external.

So far I don't understand what you want to do.


Rob

DanShephertan: Hello guys. Long time reader first time poster.

Im designing my own board for another purpose using the same ATSAM3X chip, and after reading this thread I am still a little confused as to how to setup External ram as a continuation of the internal ram - presumable this is possible.

What I mean by this, assuming that all the control, data and address lines are connected correctly to an external chip, how does the MCU know to point the stack to this external memory once all of the internal 48Kb internal ram is used ? is it automatic after setting some SMC control bit in some register?

If this is not possible, then it requires some memory management in code which is quite annoying. Any help would be appreciated.

Fair question. The external memory appears to the MCU as a block of memory at an address depending on its Chip Select line. (These pins are labelled NCS0-NCS7 on the SAM3X). Typically you would connect CSx to the CS pin on your memory device. The lowest 24 bits of a memory address are decoded by the external memory device, the upper 8 bits are decoded in the memory controller. The locations of the memory sections are shown in Figure 8-1 of the SAM3X User Manual. For example, CS0 maps to 0x60000000.

The second question is how to map the application code to memory regions. This can be done very simply by direct access and setup from the application code, but this method is less flexible. You can allocate dynamic variables at runtime just by setting their address to a hardcoded value correspodning to the external memory. If you want to get automatic placement of variables, then you would need to use the linker.

By changing the linker file, you can specify the additional memory region, and what should go into it. At runtime, RAM variables will be zeroed or initialised with data copied from Flash as necessary. This is part of the Standard C initialisation procedure. With GCC toolchain you may need to edit a startup file to include the additional memory regions.

Note that you will need to edit the startup code to setup the memory controller before letting the code does its memory copying/zeroing. If you put the stack into external memory, you must make sure your startup code does not use it before it is setup! It is probably best to set a temporary stack in internal RAM for the startup code, and then change stack after the external memory is setup. It is simpler though to keep stack in internal memory, and use the external memory for other stuff.

The third question is how to get the application to use the extended memory. This can be done in the linker file, using combination of "pragma section" and the linker file, or explicitly in the application, depending on taste.

If this all seems quite complicated, then you are probably right. It needs good knowledge of the MCU and the toolchain to set the right things at the right point. Once setup, it becomes fairly seamless, but get one tiny thing wrong and it falls over in a heap.

To keep it simple, I would probably use direct setup from the application, and use the external memory as a pool of dynamic buffers for bulk data.