Pages: 1 [2] 3   Go Down
Author Topic: Parallel library for Due External Memory Bus/Static Memory Controller  (Read 5892 times)
0 Members and 1 Guest are viewing this topic.
Offline Offline
God Member
*****
Karma: 32
Posts: 506
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Here is the result of my memcpy test:

Code:
#include <Parallel.h>

void setup() {
  Parallel.begin(PARALLEL_BUS_WIDTH_8, PARALLEL_CS_0, 17, false, true);
  Parallel.setCycleTiming(4,4);
  Parallel.setPulseTiming(3,3,4,3);
  Parallel.setAddressSetupTiming(0,0,0,0);
  Serial.begin(115200);
}

uint8_t inb[8192],outb[8192];

void loop() {

  int r=micros();
  memcpy(inb,outb,8192);
  int s=micros();
 
  for(int a=0;a<8192;a++){inb[a]=random(256);outb[a]=random(256);}
  uint8_t *m=(uint8_t *)Parallel.getAddress();
 
  int t=micros();
  memcpy(m,inb,8192);
  int u=micros();
  memcpy(m+32768,m,8192);
  int v=micros();
  memcpy(outb,m+32768,8192);
  int w=micros();

  for(int a=0;a<8192;a++)if(inb[a]!=outb[a]){ Serial.println("Error ");break;}
 
  Serial.println("memcpy speed in MiB/s");
  Serial.print("int to int "); Serial.println(7812.5/(s-r)); // 7812.5==8192*1000000/1048576
  Serial.print("int to ext "); Serial.println(7812.5/(u-t));
  Serial.print("ext to ext "); Serial.println(7812.5/(v-u));
  Serial.print("ext to int "); Serial.println(7812.5/(w-v));
  Serial.println();
 
  delay(1000);
}
Code:
memcpy speed in MiB/s
int to int 76.59
int to ext 19.88
ext to ext 8.63
ext to int 13.40

That was the fastest timings I could get from the 55ns SRAM chip without errors - the 19.88MiB/s write speed is slightly out of spec smiley-cool
Logged


France
Offline Offline
Newbie
*
Karma: 0
Posts: 3
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Thanks LMI for your answer.
Through the 8bit bus I have to read four bytes in each IC HCTL2022 and I don't want my program to loose to much time for this.
 smiley-roll-blue
Logged

Offline Offline
Newbie
*
Karma: 0
Posts: 2
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Hello guys.
Long time reader first time poster.

Im designing my own board for another purpose using the same ATSAM3X chip, and after reading this thread I am still a little confused as to how to setup External ram as a continuation of the internal ram - presumable this is possible.

What I mean by this, assuming that all the control, data and address lines are connected correctly to an external chip, how does the MCU know to point the stack to this external memory once all of the internal 48Kb internal ram is used ? is it automatic after setting some SMC control bit in some register?

If this is not possible, then it requires some memory management in code which is quite annoying. Any help would be appreciated.
Logged

nr Bundaberg, Australia
Online Online
Tesla Member
***
Karma: 121
Posts: 8433
Scattered showers my arse -- Noah, 2348BC.
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Why do you want the stack to be in external RAM?

Quote
once all of the internal 48Kb internal ram is used ?
This will never all be used by the stack unless you have a massively-recursive function (very unlikely), and if data is that large put it in external.

So far I don't understand what you want to do.
______
Rob
Logged

Rob Gray aka the GRAYnomad www.robgray.com

Offline Offline
Newbie
*
Karma: 2
Posts: 49
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Hello guys.
Long time reader first time poster.

Im designing my own board for another purpose using the same ATSAM3X chip, and after reading this thread I am still a little confused as to how to setup External ram as a continuation of the internal ram - presumable this is possible.

What I mean by this, assuming that all the control, data and address lines are connected correctly to an external chip, how does the MCU know to point the stack to this external memory once all of the internal 48Kb internal ram is used ? is it automatic after setting some SMC control bit in some register?

If this is not possible, then it requires some memory management in code which is quite annoying. Any help would be appreciated.

Fair question. The external memory appears to the MCU as a block of memory at an address depending on its Chip Select line. (These pins are labelled NCS0-NCS7 on the SAM3X). Typically you would connect CSx to the CS pin on your memory device. The lowest 24 bits of a memory address are decoded by the external memory device, the upper 8 bits are decoded in the memory controller. The locations of the memory sections are shown in Figure 8-1 of the SAM3X User Manual. For example, CS0 maps to 0x60000000.

The second question is how to map the application code to memory regions. This can be done very simply by direct access and setup from the application code, but this method is less flexible. You can allocate dynamic variables at runtime just by setting their address to a hardcoded value correspodning to the external memory. If you want to get automatic placement of variables, then you would need to use the linker.

By changing the linker file, you can specify the additional memory region, and what should go into it. At runtime, RAM variables will be zeroed or initialised with data copied from Flash as necessary. This is part of the Standard C initialisation procedure. With GCC toolchain you may need to edit a startup file to include the additional memory regions.

Note that you will need to edit the startup code to setup the memory controller before letting the code does its memory copying/zeroing. If you put the stack into external memory, you must make sure your startup code does not use it before it is setup! It is probably best to set a temporary stack in internal RAM for the startup code, and then change stack after the external memory is setup. It is simpler though to keep stack in internal memory, and use the external memory for other stuff.

The third question is how to get the application to use the extended memory. This can be done in the linker file, using combination of "pragma section" and the linker file, or explicitly in the application, depending on taste.

If this all seems quite complicated, then you are probably right. It needs good knowledge of the MCU and the toolchain to set the right things at the right point. Once setup, it becomes fairly seamless, but get one tiny thing wrong and it falls over in a heap.

To keep it simple, I would probably use direct setup from the application, and use the external memory as a pool of dynamic buffers for bulk data.
Logged

Offline Offline
Newbie
*
Karma: 0
Posts: 2
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

This has been very helpful thanks. I will not be this ambitious and stick to directly pushing objects into external ram instead.

I have one other related question. Can the due be programmed directly from USB or does it need the JTAG? I understand it cannot use the ISP like the Mega, but I really dislike those large JTAG connectors and I dont really have the real estate either. Suggestsions?
Logged

nr Bundaberg, Australia
Online Online
Tesla Member
***
Karma: 121
Posts: 8433
Scattered showers my arse -- Noah, 2348BC.
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

AFAIK there are 4 options for programming a SAM3X

USB bootloader (built in)
UART bootloader (built in and what the Due uses)
JTAG (10-way header but .05" spacing so pretty small)
SWD (4-way header, normal .1" spacing)

So you can load programs directly from USB but right now I can't find a description of exactly how you do that.

______
Rob
Logged

Rob Gray aka the GRAYnomad www.robgray.com

nr Bundaberg, Australia
Online Online
Tesla Member
***
Karma: 121
Posts: 8433
Scattered showers my arse -- Noah, 2348BC.
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
memcpy speed in MiB/s
int to int 76.59
int to ext 19.88
ext to ext 8.63
ext to int 13.40

@stimmer
From what I see in the data sheet it takes 6 cycles (Figure 27-7) to perform an X memory access, 84MHz / 6 = a transfer rate of 14MB/s and yet you get nearly 20.

Can you explain how this happens? Maybe there are savings in setup and hold times for a block move or something.

EDIT: OK I see in a later diagrams accesses down to 3 cycles with 0 setup and hold times, that would be 28MB/s so I guess anywhere between 14 and 28MB/s is fair game depending on various factors.

______
Rob

Logged

Rob Gray aka the GRAYnomad www.robgray.com

Offline Offline
Newbie
*
Karma: 0
Posts: 1
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Hello world. smiley
I have a question about ext/SRAM. Can I connect two SRAM ic's (8bit) to external memory bus?
Use a full 16bit data bus and 17bit address bus for two ic's. I have two CY7C1019DV33 (128kx8bit) http://www.cypress.com/?docID=31943.
I know that some pins is not connected, I can connect them directly from sam3x ic. But how about NRD pin? Is it possible to do without it? (connect OE to ground as @stimmer did it)

PS: Sorry if my English is bad.
Logged

Ottawa,Canada
Offline Offline
Jr. Member
**
Karma: 0
Posts: 82
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

interesting project & will experiment with it in the next few days

I have in mind a large fifo  32Meg'ish (fpga tied to a sdram)
using no addresses - just reads & writes (need the PWM lines for other aspects of my programs) 
accounting can be taken care of by the Due
It would need a NWE/NRD/NBSx line

This is all to be rid of SDcard latency - i have too much data throughput  to deal with 100-300ms delay 
hmmmmm...
Logged

Offline Offline
Newbie
*
Karma: 0
Posts: 5
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Hi,

i am using this parallel library to drive a LCD display. I am using one address line to toggle the RS line of the display.

Unfortunately i do not get the first five address lines to work. The first address line that is working as expected is line A5. A0-A4 stay constantly high.

So my display works fine if i use line A5 (and 5-times faster than using ports), thanks for the work!

But i would like to use A0 (pin 9) instead of A5. Any idea how i can get A0 working? Any trick how to enable A0/C.21/PWM9?
Logged

Offline Offline
Newbie
*
Karma: 1
Posts: 12
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Did you look at the example provided in the library (S1D13700_LCD)?  It is using one address bit to communicate with an external parallel LCD...  I haven't tested that in a while, but it was working when I posted the code.  If not, then perhaps something has changed in the Arduino config since then?

Hi,

i am using this parallel library to drive a LCD display. I am using one address line to toggle the RS line of the display.

Unfortunately i do not get the first five address lines to work. The first address line that is working as expected is line A5. A0-A4 stay constantly high.

So my display works fine if i use line A5 (and 5-times faster than using ports), thanks for the work!

But i would like to use A0 (pin 9) instead of A5. Any idea how i can get A0 working? Any trick how to enable A0/C.21/PWM9?
Logged

Offline Offline
Newbie
*
Karma: 0
Posts: 5
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Yes, i used the LCD example as the foundation of my code. With that example code A0 wasn't working also. Seems something has changed...
Logged

Turku
Offline Offline
Full Member
***
Karma: 0
Posts: 201
Arduino rocks
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

About unusable pins. You can still use external memory if some address or datapins are not available.  If address pin is missing, you just can't use the whole memory space. With missing data pins it is similar.

So we'll have like a 256k 14bit ram or something available, instead of a full 1M 16bit device. But ofcourse cpu with a proper external bus would be nice.
Logged

nr Bundaberg, Australia
Online Online
Tesla Member
***
Karma: 121
Posts: 8433
Scattered showers my arse -- Noah, 2348BC.
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

If address pins are missing you will have holes in the space that would cause duplications on top of other data.

So for example if A4 was missing you could write 16 contiguous bytes ok, but the 17th byte will go into location 0, thus overwriting the first byte. This could be manageable but a right PITA.

Likewise with data, the top bits may not matter if you stick to values below the first missing bit, but if you are missing any low-order bits you will be in trouble. This is almost not possible to use unless you "adjust" every variable you save and "unadjust" every variable you retrieve.

______
Rob
Logged

Rob Gray aka the GRAYnomad www.robgray.com

Pages: 1 [2] 3   Go Up
Jump to: