Just a newby asking the 64k question again - Arduino Mega2560

Ok, an experiments with Lefty's code. I generated some exhaustive initialization just to see what would happen.

#include <avr/pgmspace.h>   //To store arrays into flash rather then SRAM
// Simple sketch to create large sketch sizes for testing purposes
/*
  Blink
  Turns on an LED on for one second, then off for one second, repeatedly.
 
  This example code is in the public domain.
 */
 
// Pin 13 has an LED connected on most Arduino boards.
// give it a name:
int led = 13;

/* 
 Make arraysize = to 1500 for 328P chip, 4000 for 1280P chip?,
 3600 for 644P chip, xxxx for 1284P,  etc.
*/
const int arraysize= 3000;  // value to mostly fill available flash capacity

long myInts0[arraysize] PROGMEM = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,
30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,... // up to 2999
long myInts1[arraysize] PROGMEM =  {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,
30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,... // up to 2999
//...
//...up to
long myInts9[arraysize]PROGMEM={//etc

void setup() {                
  // initialize the digital pin as an output.
  pinMode(led, OUTPUT); 
  int i = random(0,arraysize);      // Work around any optimization for constant values
  Serial.print(myInts0[i]);         //  Access some random element so the array can't be optimized away.
  Serial.print(myInts1[i]);         //  Access some random element so the array can't be optimized away.
  Serial.print(myInts2[i]);         //  Access some random element so the array can't be optimized away.
  Serial.print(myInts3[i]);         //  Access some random element so the array can't be optimized away.
}

// the loop routine runs over and over again forever:
void loop() {
  
  digitalWrite(led, HIGH);   // turn the LED on (HIGH is the voltage level)
  delay(1000);               // wait for a second
  digitalWrite(led, LOW);    // turn the LED off by making the voltage LOW
  delay(1000);               // wait for a second
  
}

What I've noticed is that if the sketch size goes over 128K I start getting errors of the form:

warning: internal error: out of range error...

That is, if I add a an additional array of [arraysize] longs where arraysize = 3000, bringing the number up to 11 arrays, each of 3000 long elements = 4113000 = 132K bytes (then add the rest of the sketch ) I start getting that out of range error called on various libs. For instance the first place it shows up is as an out of range error on the do_random func random.o. If I comment out the call to random, it shows up in Hardware Serial.

If I stick to 10 initialized arrays the sketch size is 124,996 ( = 4103000 + rest of sketch). No problems compiling/linking.

Now interestingly. suppose instead of long int I initialize each of the myInts array to 3000 4 byte null terminated strings thus:

char* myInts9[arraysize] PROGMEM ={"abc\0","abc\0","abc\0",//..etc for 3000 initializers

For 10 initialized arrays of 3000 four-byte-strings the IDE reports a sketch size of 64,646 bytes out of a 258,048 byte maximum. The same sketch initialized to longs is reported as 124,996.

For 20 initialized arrays of 3000 four-byte strings the IDE reports

arduino-1.0.3\hardware\arduino\cores\arduino/main.cpp:11: warning: internal error: out of range error

Binary sketch size: 136,682 bytes (of a 258,048 byte maximum)

If I take out 1 array I get no errors and
Binary sketch size: 130,646 bytes (of a 258,048 byte maximum)

I have not yet tried to load and run these sketches.

Cheers,
Joe

It doesn't totally surprise me you are having these problems. My experience has been that the Mega side of the platform, being used less, hasn't received as much attention (eg. bootloaders that don't handle the watchdog timer).

Then people compiling large arrays (in itself perfectly sensible) are probably in a minority too.

It would be interesting if, when you solve this, we document it so others can benefit from it.

Meanwhile you could always consider my suggestion of trying to store your dictionary more compactly and make the problem go away. :slight_smile:

Thanks Nick.
Yes, absolutely, a more intelligent data structure would be more compact and faster too.
Of course, even then there is the bald-faced challenge of trying to use up the whole 256k of the Mega....and once solved I'll also use tries and get even more packed in there!!!
Cheers
Joe

So how can this 'solution' be implemented in context of writing and uploading sketches in the Arduino IDE?

Lefty

Copy 'n' paste.

You cannot address huge variables even in the < 64k boundary.
Here is a program using every single byte on the mega using progmem, it is possible, and GET_FAR_ADDRESS will help you read the data.

#define nothing

template< uint64_t C, typename T >
  struct LargeStruct{
    T Data;
    LargeStruct< C - 1, T > Next;
};
template< typename T > struct LargeStruct< 0, T >{ };

typedef LargeStruct< 80, uint64_t > Container; //640 bytes

PROGMEM LargeStruct< 50, Container > l_Struct;  //32k
PROGMEM LargeStruct< 50, Container > l_Struct1;   //32k
PROGMEM LargeStruct< 50, Container > l_Struct2;   //32k
PROGMEM LargeStruct< 50, Container > l_Struct3;   //32k
PROGMEM LargeStruct< 50, Container > l_Struct4;   //32k
PROGMEM LargeStruct< 50, Container > l_Struct5;  //32k
PROGMEM LargeStruct< 50, Container > l_Struct6;   //32k
PROGMEM LargeStruct< 50, Container > l_Struct7;   //32k

PROGMEM LargeStruct< 431, uint16_t > l_Struct8; //862 bytes
void setup()
  {
    volatile int i = ( int ) &l_Struct;
    volatile int i1 = ( int ) &l_Struct1;
    volatile int i2 = ( int ) &l_Struct2;
    volatile int i3 = ( int ) &l_Struct3;
    volatile int i4 = ( int ) &l_Struct4;
    volatile int i5 = ( int ) &l_Struct5;
    volatile int i6 = ( int ) &l_Struct6;
    volatile int i7 = ( int ) &l_Struct7;    
    volatile int i8 = ( int ) &l_Struct8;  
  }

void loop(){}

pYro_65:

So how can this 'solution' be implemented in context of writing and uploading sketches in the Arduino IDE?

Lefty

Copy 'n' paste.

You cannot address huge variables even in the < 64k boundary.
Here is a program using every single byte on the mega using progmem, it is possible, and GET_FAR_ADDRESS will help you read the data.

#define nothing

template< uint64_t C, typename T >
 struct LargeStruct{
   T Data;
   LargeStruct< C - 1, T > Next;
};
template< typename T > struct LargeStruct< 0, T >{ };

typedef LargeStruct< 80, uint64_t > Container; //640 bytes

PROGMEM LargeStruct< 50, Container > l_Struct;  //32k
PROGMEM LargeStruct< 50, Container > l_Struct1;   //32k
PROGMEM LargeStruct< 50, Container > l_Struct2;   //32k
PROGMEM LargeStruct< 50, Container > l_Struct3;   //32k
PROGMEM LargeStruct< 50, Container > l_Struct4;   //32k
PROGMEM LargeStruct< 50, Container > l_Struct5;  //32k
PROGMEM LargeStruct< 50, Container > l_Struct6;   //32k
PROGMEM LargeStruct< 50, Container > l_Struct7;   //32k

PROGMEM LargeStruct< 431, uint16_t > l_Struct8; //862 bytes
void setup()
 {
   volatile int i = ( int ) &l_Struct;
   volatile int i1 = ( int ) &l_Struct1;
   volatile int i2 = ( int ) &l_Struct2;
   volatile int i3 = ( int ) &l_Struct3;
   volatile int i4 = ( int ) &l_Struct4;
   volatile int i5 = ( int ) &l_Struct5;
   volatile int i6 = ( int ) &l_Struct6;
   volatile int i7 = ( int ) &l_Struct7;    
   volatile int i8 = ( int ) &l_Struct8;  
 }

void loop(){}

So I added a blink function to your example sketch after removing some of the 'structure stuff' so as to fit in a 1280 chip and uploaded to a mega.

No compile errors, compile size 98,726 of 130,048 maximum, upload proceeds with no errors, but certainly takes a while. When done no blinking led13? Why does the sketch not run? That is the same symptom I was seeing in my code example posted earlier, I could create sketches of desirable size but after a certain size the blink in loop() doesn't execute?

#define nothing

template< uint64_t C, typename T >
  struct LargeStruct{
    T Data;
    LargeStruct< C - 1, T > Next;
};
template< typename T > struct LargeStruct< 0, T >{ };

typedef LargeStruct< 80, uint64_t > Container; //640 bytes

PROGMEM LargeStruct< 50, Container > l_Struct;  //32k
PROGMEM LargeStruct< 50, Container > l_Struct1;   //32k
PROGMEM LargeStruct< 50, Container > l_Struct2;   //32k
/*PROGMEM LargeStruct< 50, Container > l_Struct3;   //32k
PROGMEM LargeStruct< 50, Container > l_Struct4;   //32k
PROGMEM LargeStruct< 50, Container > l_Struct5;  //32k
PROGMEM LargeStruct< 50, Container > l_Struct6;   //32k
PROGMEM LargeStruct< 50, Container > l_Struct7;   //32k
*/
PROGMEM LargeStruct< 431, uint16_t > l_Struct8; //862 bytes
int led = 13;
void setup()
  {
    pinMode(led, OUTPUT);
    volatile int i = ( int ) &l_Struct;
    volatile int i1 = ( int ) &l_Struct1;
    volatile int i2 = ( int ) &l_Struct2;
 /*   volatile int i3 = ( int ) &l_Struct3;
    volatile int i4 = ( int ) &l_Struct4;
    volatile int i5 = ( int ) &l_Struct5;
    volatile int i6 = ( int ) &l_Struct6;
    volatile int i7 = ( int ) &l_Struct7;    
    volatile int i8 = ( int ) &l_Struct8;
  */  
  }

void loop(){

 digitalWrite(led, HIGH);   // turn the LED on (HIGH is the voltage level)
  delay(1000);               // wait for a second
  digitalWrite(led, LOW);    // turn the LED off by making the voltage LOW
  delay(1000);               // wait for a second
}

Lefty

Good find, I didn't try anything like that. I previously wasn't able to answer the questions as that sketch is a bit obscure, but as I need pgm functionality, I've done a little investigating to work out what is happening.

Firstly, the structures placed into progmem are considered first, before the functions. So as a consequence, the function's code ( main, loop, pinMode, etc... ) is placed in a region not accessible by conventional 16-bit pointers. Therefore to call this code you have to jump to the trampoline, which in turn contains a jump to your code.

From what I was able to ingest, functions that exist in the >64K word boundary automatically have an entry in a thing called a 'trampoline'. It is a table of jumps to locations in the higher memory, allowing the entire memory range to be used.

Also as I understand it, if main is in high memory, it will still be called via the trampoline. The problem with this sketch is, not that the 'far' code isn't being called.
Its just that the arduino libraries do not expect their PGM data to be out of range.

There are a number of ways to get things working. For instance you could update the core to use far pointers where necessary, or the easiest quick fix is to place the structure data after the functions so the core can have full use of the lower address range, however there is an overhead to using high range access, critical stuff should remain in the lower section.

This macro will place data after other sections. Found here

#define PROGMEM_FAR  __attribute__((section(".fini7")))
#define nothing

template< uint64_t C, typename T >
  struct LargeStruct{
    T Data;
    LargeStruct< C - 1, T > Next;
};
template< typename T > struct LargeStruct< 0, T >{ };

typedef LargeStruct< 80, uint64_t > Container; //640 bytes

#define PROGMEM_FAR  __attribute__((section(".fini7")))

PROGMEM_FAR LargeStruct< 50, Container > l_Struct;  //32k
PROGMEM_FAR LargeStruct< 50, Container > l_Struct1;   //32k
PROGMEM_FAR LargeStruct< 50, Container > l_Struct2;   //32k
PROGMEM_FAR LargeStruct< 431, uint16_t > l_Struct8; //862 bytes

int led = 13;

void setup()
  {
    pinMode(led, OUTPUT);
    volatile int i = ( int ) &l_Struct;
    volatile int i1 = ( int ) &l_Struct1;
    volatile int i2 = ( int ) &l_Struct2;
  }

void loop(){

 digitalWrite(led, HIGH);   // turn the LED on (HIGH is the voltage level)
  delay(1000);               // wait for a second
  digitalWrite(led, LOW);    // turn the LED off by making the voltage LOW
  delay(1000);               // wait for a second
}

My problem appears to be this problem:

..it looks like the version of g++ used by Arduino will fail whenever the global constructors get pushed beyond the 64k limit, because the global constructor table is only 16bits wide and the code uses ijmp to access it...

Found here: http://code.google.com/p/arduino/issues/detail?id=1067

I'm compiling avr-gcc-4.7.2 right now just for grins, and I will try building outside the Arduino IDE environment to see if I can get it to load.

Ah, I had long forgotten the joys of code.

Read the post I wrote above, the answer is in there.

Move your pgm data after the code, then the constructor table starts in the lower address memory.

Ok. Progress after many hours of bashing. I promised to report back, and I am.
Consider this my engineering notebook of sorts. I'm not trying to instruct you experts, but rather, just recording my experiences. I know I do not have a complete understanding yet but this is as far as I got. I need more info - particularly on memory sections and addressing. But Maybe this will be valuable to someone.

My application is that I am putting a dictionary of 10k words into an Arduino Mega2560 and accessing it through some means for display on an LCD. My IDE is (mostly) the Arduino 1.0.3, and I'm primarily using MacOSX v10.6. What I eventually got to work was on that system.

I also tried compile/link/load sequence from the command line using avr-gcc 4.6.2 and 4.7.2. I tried the AVR environment on Eclipse and also on Mac XCode. Results varied in different ways with all those methods. In all cases I'm using the most version of avrdude (I forget which) that comes with 1.0.3, and also which is obtained when you fetch it on MacOSX with macports or fink. It's the same version in all cases.

I'm pretty confident that the combination of 4.7.2 and XCodeAVR or EclipseAVR would give me an entirely different experience than the ArduinoIDE. However I didn't do more than blinky-with-big-global-vars on any of those because I didn't have the time/patience to go back and recast all my Arduino code in native AVR speak. However I was very successful in compiling and linking to a .hex that contained flash/global data structures larger than 64k. I did not attempt to access those, though.

Things learned:

There is a real 64k boundary for data today. Note that code is not as limited. The vector long jump table (which I guess is called a "trampoline table" in AVR speak) consists of 32 bit words and so could easily address much more than the 256k in the Mega. However, data access is limited by 16-bit pointer addressing, so the largest chunk is 64k. Now, there is a great exception to this. By concatenating bits from another register (RAMPZ) you can effectively create a 24bit pointer for data, and this is done, partially in some asm code available in multiple libraries out there. At the moment, on the version of the IDE we have, though you can specify multiple 64k byte clumps, you can't cross that boundary on any one array or data structure.

Future versions of avr-gcc seem to address this. Particularly, it appears that in 4.7.2 the compiler/linker is happy with larger-than-64k global defs. And let me say here - it's global defs that are not-changing that we are talking about at all times, because it goes into flash. If you say"
volatile int xyz = 123;
That gets put into RAM and is globally defined for all your code/ISRs. This is not what I'm saying. If I say

volatile int xyz PROGMEM_blahblah = 123;

The "volatile" piece is not particularly useful, far as I can tell. By specifying PROGMEM_... you're putting the data into flash, and as such, it has to be globally accessible because you can't put PROGMEM data into local vars. And as it's not changable this would be just as effective and apparently does exactly the same thing:

int xyz PROGMEM_blahblah = 123;

The problem with getting things to compile when you've got more than 64k in one chunk (and it will compile under certain circumstances) is that the rest of the system has no clue how to handle it because the ptr to the globals is only 16bits and theres nothing upon nothing you can do about that. So you wind up with linker errors, and errors that show up in other places - in code you had nothing to do with. This is a sign of badness, and the signal that you need to accept your 64k limitation with happiness and move on. Because it will be solved at some point.

In the version of avr-gcc available in the Ar1.0.3 (I believe it's 4.3.2) distribution, the compiler balks at "certain" declarations/structures which when initialized reach over 64k. And this is a key point - most of the problems surface in the "initialization" of defined vars, and not in the definition itself.

By the way, you can happily define/allocate empty vars to your hearts content. This is a red herring, though, I have found. In a lot of the tests run here, and also ones I've tried, you can play various games to get the compiler to NOT optimize away unused/uninitialized space. But the results are variable.

First, let me indicate we are talking about data in Flash. For all intent and purposes, data in Flash may as well be data in a ROM. Yes, I know there are ways to modify it during runtime, but that goes beyond what I have tried here.

As the flash data is essentially in a ROM, it is not unreasonable to expect that you would know what it was apriori. That is - you're going to burn it into flash, so you're certain what the data is. Therefore, all vars/data going into flash are known ahead of time and the compilers/linkers presume as much. This is an important point, and also one which is the source of a lot of pain. You can do the following in your code:

const char abc[]  PROGMEM_blahBlah = "abc"; // PROGMEM_blahblah to be explained

And the abc will become abc[4] after compilation - that is, three chars 'a','b','c' and a trailing null '\0'. Please note that the compilers presume that declaration/initialization is intended to define a string. You don't get to come back later and say - hey, there's only 3 chars I didn't mean for it to be a null-terminated string. Too bad. The compiler is helping you. If you want a 3 char array you say:

const char abc[3]  PROGMEM_blahBlah = 'a','b','c';

and you get that. But in the determination of your memory usage you may be thrown off when the compiler tries to help you by adding another byte. In addition, when you're doing pointer arithmetic, you can't (always) address things in various parts of flash. For instance, depending on how you organized things this gives you garbage:

 int aa[3] PROGMEM_yak= 1,2,3;
int fetchedInt;
for(int x=0;x,3;x++){
fetchedInt = pgm_read_word_far(&aa[x]);
printf("%d=%d\n",x,fetchedInt);
}

where this will work

 int aa[3] PROGMEM_yak= 1,2,3;
int a,b,c;
a = pgm_read_word(&aa[1]);
b = pgm_read_word(&aa[2]);
c = pgm_read_word(&aa[3]);
printf("a=%d,b=%d,c=%d\n",a,b,c);

Now in my app, I am defining 10,000 words in a way I can access them. I have tried several methods. I have tried putting them in a struct like this:

typedef struct {
         const char a[2];
         const char aardvark[9];
         //... etc 10k words
} words;

const words dictionary PROGMEM_blahblah = {
             {"a"}, {"aardvark"},//etc, 10000 initializers};

And that will compile just with the current version of avr-gcc. However, it generates an error on versions 4.6.2 and 4.7.2 that say something to the effect of "internal error: report a bug ..."

Doing this

typedef struct {
         const char a[2];
         const char aardvark[9];
         //... etc 10k words
} words;

const words dictionary PROGMEM_blahblah = {
             .a="a",
             .aardvark="aardvark",
             //etc, 10000 initializers};

Will not compile on the current Arduino IDE but will compile on version 4.6.2 and v4.7.2.

I inevitably settled on a different means - and this isn't entirely debugged. I'm still having trouble with the flash memory sections, but I did the following:

const char apple[6] PROGMEM_yadda="apple";
const char bear[5] PROGMEM_yadda="bear"; 
// etc. etc. etc. 10000 words

const char* dictionary[10000] PROGMEM_yadda+1 ={apple, bear, //etc etc

The idea is that 64k of dictionary data is in a chunk of flash designated by the attribute "PROGMEM_yadda" and an array of pointers to that data is in a chunk of memory called "PROGMEM_yadda+1" (it may be obvious to most - but please don't try to create code with PROGMEM_yadda... that's just an example ) That compiles and links peachily.

Now - PROGMEM what I learned about PROGMEM is that as an attribute it specifies putting data into flash. Using PROGMEM is a multi-step process. You have to put it into your code, explicitly. But you also have to change the linker script to understand what you mean by that attribute. And then retrieving data stored via PROGMEM requires using accessors of the form

 pgm_read_word_far(&myArray[i]);

...continued

...from above

But the important thing to note is that PROGMEM designates that the linker put your data into a space that is 64k in size. If you try to specify data bigger than 64k it might actually compile, but your results are going to be irratic - at least mine were using 1.0.3. Another important thing to note is that the PROGMEM attribute puts the data in a certain place, and only in that place. If you have more than 64k of data, you can put some where PROGMEM says and that spot is 64k big, but then where do you put the rest, and how do you get to it?

Now, just for the sake of discussion - there are lots of flash memory sections, used for different things. You have the .text section, and the .data section, and lots of .fini1, .fini2,.finix...etc. Code and data is marked by the attribute((section(".blah"))) tag, and then there is a corresponding note in the linker script that says what to do with things that are in section ".blah".

Some have had success putting data in the section marked with the attribute ".fini7". The AVR documentation says this is a user-definable section, and you can certainly use it to put a 64k chunk. If you have more than 64k you need more than one section.

Well, after much weeping and gnashing of teeth I got a lot of help from this:

http://www.avrfreaks.net/index.php?module=PNphpBB2&file=viewtopic&t=93874&highlight=

which explains how to set up several PROGMEM memory segments/sections. It pushes the static data into flash above the code itself, as is recommended by many. It also provides some code to address those sections. So, using that method which not only required using an include called "morepgmspace.h" - and plz note this is a modified one from Carlos Lamas's original one, as well as making mods to the linker script found as "avr6.x" - which is the script used for chips that have 256k of Flash. Other chips will have different linker scripts.

With that combination of things, I could specify constant global vars with the tags: PROGMEM_SEG1, PROGMEM_SEG2, PROGMEM_SEG3. I can them address them separately through a variety of means. The linker script is modified to place the data in locations which begin at 0x10000, 0x20000, and 0x30000 respectively.

Given that combo of linker script changes and includes I could do the following in code given the way I specified the data above:

char wordIGot[50];
uint_farptr_t theUltimateAddress  = GET_FAR_ADDRESS(myDictionary) + PROGMEM_SEGyadda_BASE_ADDRESS + (indexOfWord*sizeof(char*));

strcpy_Pyadda(wordIGot,pgm_read_byte_far(theUltimateAddress);

Note that I have not yet figured out which strcpy function to use with these addresses, so I wrote my own.

apparently, this also works in some universe but I have not yet got it to work:

#include <morepgmspace.h>
char myWord[50];
int theIndexOfTheWordIWant;
strcpy_PF(myWord, pgm_get_word_far(&myDictionary[theIndexOfTheWordIWant]));

What I have been unable to accomplish at this point, is that I cannot retrieve the char* pointers from the array of 10000 pointers I specified as "dictionary[]" above.
Just for sake of example, I have stored 64k of strings in the memory noted with the attribute PROGMEM_SEG2. I have stored the 10000 pointers to that data in PROGMEM_SEG1.

I am presuming these char* pointers are all 16-bit, so that they are addresses to the char strings all WITHIN the same memory segment, and that I will have to do something of the order of:

#include <morepgmspace.h> // note this is the MODIFIED one available on avrfreaks.com, not the original from Carlos Lamas's website
uint16_t theLocalAddress = pgm_read_word_far(&dictionaryWhichIsInSeg1[which word do I want]);

//but as I mentioned the above line doesn't seem to work always, as you can't retrieve the dictionary address through variable indexing.
//In that case, you have to do this..

 uint16_t theLocalAddress = pgm_read_word_far(GET_FAR_ADDRESS(dictionary) + PROGMEM_SEG1_BASE + whichWordDoIWant*sizeof(char*));

//but I'm not sure how well that works either

then

char* theRealAddress = pgm_read_word_far(theLocalAddress + PROGMEM_SEG2_BASE);

However, I think I may be in some sort of compiler optimization hell. It seems impossible to retrieve the pointers to the const char*s with pgm_read_word_far...but I can get to the words themselves, which are stored sequentially in PROGMEM_SEG2 and thus seem like one big long string.

I also ran into the perennial issue with the Mega2560 bootloader hanging...the new version of the bootloader hex does seem to work but I wind up with a problem on verification. It gives me a warning, but the code loads just fine.

If anyone followed all that, thanks much for your valuable time.

With kind regards
Joe

I must admit I skimmed a bit, but I think I can explain something.

The raw assembler commands can, quite efficiently, index into 64 bytes of memory by using X, Y and Z registers which are 16-bit. Bear in mind it is basically an 8-bit processor. :slight_smile:

Thanks to your post I spotted on page 2 of the assembler manual RAMPX, RAMPY and RAMPZ which can increase the addressing range. I presume though, that you have to decide in advance whether you want to access below 64K or above 64K. Reading further, though, that applies to SRAM not PROGMEM.

Further reading reveals ELPM, a variant on LPM (load from program memory) which appears to address 24-bits into program memory.

Presumably, to retrieve data from a large array, in program memory, the ELPM instruction has to be generated at some point.

Hi Nick
Yes, I have seen in the code <pgmspace.h> and also <morepgmspace.h> where for machines that support ELPM, the macros pgm_read_xxx_far are mapped into ELPM calls instead of LPM.
However, I've yet to determine exactly how to use those. It does seem to work just fine when I have a single 64k block of flash data tagged with the attribute PROGMEM. I think this is true because the following works:

#include <pgmspace.h>
//...(other includes)
char myString[10];
const char abc[] PROGMEM = "abc";
const char def[] PROGMEM = "def";
const char* abcdef[] PROGMEM = {abc,def};
// ... stuff
for(int i=0;i<2;i++){
strcpy_P(myString,pgm_read_word(&abcdef[i]));
Serial.println(myString);
}

Note the use of strcpy_P is defined in <pgmspace.h>. Note also that I can address the array with a var, which does not seem to be true if you try to create a var in memory space mapped through other means - like the terms PROGMEM_FAR and PROGMEM_SEGx as specified in <morepgmspace.h> In those cases, the compiler seems to want a static situation where you do something like;

long_address = GET_FAR_ADDRESS(abcdef) + BASE_ADDRESS_OF_MEMORY_SECTION + (which element)*sizeof(int);
strcpy_whatever(myString, pgm_read_word(long_address);

I suppose 24-bit addressing must exist (at least, 18-bit addressing must) exist even to accept an address in that upper 256k...

If I use multiple 64k flash sections or I just try to rely on 24-bit addressing by specifying

#include <morepgmspace.h>
const variable myVar[70k's worth]... a whole lot of stuff more than 64k

then this does not work...

int abc = pgm_read_word(&abcdef[i]);

So I am unsure of the occasions when 24-addressing is supported by the current compiler/linker, and when it's not... As I said, it seems that with versions avr-gcc 4.6.2 and 4.7.2 it does not balk at the code which calls more than 64k. But I haven't yet figured out if I can actually make it work with my app. I still have to go back and de-Arduino-ize my Arduino code into pure AVR code to try that in another environment, like AVR-Eclipse or AVR-XCode, which support those compiler versions...

Cheers
Joe

However, I think I may be in some sort of compiler optimization hell. It seems impossible to retrieve the pointers to the const char*s with pgm_read_word_far

Take heed to the '64K boundary' talk, it is not an optimisation/bug but part of the design.

Pgm pointers are 16-bit word pointers. As in, 64K unique addresses, 128K bytes addressed.
The 64Kb boundary for variable data is due to standard pointers addressing a single byte.

If you treat your upper memory as word aligned you could then change your addressing scheme, but then the flash gets swallowed twice as quick.

The way I think it fails is...
The linker puts things together like this:

   vectors
   progmem
   trampolines
   code

I think that this means that you'll have problems whenever the progmem data exceeds 64k (+/- a little bit), because the trampolines have to be in the first 64k (and I think the vectors can only access the first 64 or 128k)

I'm not sure why this is done; having progmem in the first 64k is "convenient", but having the trampolines and the startup code in the first 64k is NECESSARY.

The new 4.6.2 compiler seems to do the same thing.
I think 4.7 includes a 24bit pointer type, and a complete rework of how progmem is done, but it's still likely to require some special treatment to get large sketches to run.

In theory, the behavior can be changed with a custom linker script.

Pgm pointers are 16-bit word pointers.

Well, not quite. The flash memory is organized as words, and program instructions are always aligned on a 16bit boundry. The "jmp" and "call" instructions do not include a bit that differentiates which byte, so in theory a "pointer to function" could be a word pointer that addresses 128k of memory. The bootloader and programming protocols use a lot of word pointers, which is why "burn bootloader" starts to require different tools beyond the 128k (byte) barrier rather than the 64k barrier.

However, the indirect jump (jump to the address contained in a register) AND the "load program memory" instruction (for reading data from flash) both take a full 16bit byte pointer; the low bit is ignored for the ijmp, and does bytewise addressing for LPM. So most actual pointers really are byte pointers.

I think the vectors can only access the first 64 or 128k

This part turns out to be wrong. The two-word "jump" and "call" instructions can access the full 4MB address space.
However, the indirect jump/call (ijmp/icall) instructions can only access the first 64kbytes, even on CPUs with 128k of memory. Chips with 256k of memory add an "EIJMP/EICALL" that can jump anywhere.

(This sort of "kludge" is why I think 32bit CPUs (like the ARM in Due) will displace 8bit CPUs in applications where code or data exceed 64k.)

Hi and thanks all,
I did get my code to work tonite. I divided the data and pointers into 3 segments each of which is less than 64k. There are 2 PROGMEM segments containing a bunch of explicitly initialized strings of the form:

#include <morepgmspace.h> // and associated mods to ldscript to accommodate the PROGMEM attrs
const char abcde[9] PROGMEM_SEG2 = "aardvark";

And also

const char fghij[4] PROGMEM_SEG1 = "ant";

And another segment full of pointers to those strings like this

const char* dictionary[10000] PROGMEM_SEG3= {abcde,fghij,..};

I can pull the chars out of the various segments like this

char* myWordInSeg2= GET_FAR_ADDRESS(dictionary[0]) + SEG2_OFFSET + index*sizeof(char*);
       strcpy_Pseg2(localCharArray,myWordInSeg2); // I wrote the copy to handle addressing from SEG a

It works but I'm durned if I can figure out why...somewhere some 18/24 bit addressing is taking place via ELPM... I may just be lucky because my strings are all loaded sequentially. The char ptrs in SEG3 seem to me to be useless. I think I'm just getting into seg2 and yanking out strings wantonly without guidance from the ptrs in the dictionary array. But somehow this is working. The fact I don't understand it makes me nervous it is not a rigorous solution that only works temporarily.

Anyway.. Onward to Arduweenie land,
Joe

OK, the following works:

Using the multi-progmem-segment method

  1. Store 64k of strings in PROGMEM_SEG2 (doesn't have to be exact or padded) denoted with a memory start location of 0x20000
  2. Store ~64K of strings in PROGMEM_SEG3 denoted with memory start location of 0x30000
  3. Store 20k of pointers to those strings in the lowest progmem space with PROGMEM_FAR which goes...I dunno where.

Plz note that you cannot deposit into memory a chunk of data larger than 64k, at least not with this version of avr-gcc/ld in the arduino 1.0.3. I believe this will be different once we get to avr-gcc V4.7...someday...maybe.

The strings can be accessed thus:

#include <morepgmspace.h>

if(accessing a char* in a string at a location < size of the 64k of stuff in seg 2){

      unsigned int index = pgm_read_word_far(GET_FAR_ADDRESS(dictionary0[0]) +wordNumber*2); //wordNumber is the index into the array of strings called dictionary0
      strcpy_PX(newWord,index,PROGMEM_SEG2_BASE);
  
    } 
    else {
      unsigned int index = pgm_read_word_far(GET_FAR_ADDRESS(dictionary0[0])+wordNumber*2); //note that the array dictionary0 also has data in SEG3!
      strcpy_PX(newWord,index,PROGMEM_SEG3_BASE);

    }

char* strcpy_PX(char* des, uint_farptr_t src,unsigned long base)
{
  unsigned long p = base + src;

  char* s = des;
 
  do {
    *s = pgm_read_byte_far(p++);
  }
  while(*s++);
  return des;
}

Note that you cannot access the data in those SEG2/SEG3 locations through specification of an indexed addressing scheme like

 unsigned int index = pgm_read_word_far(GET_FAR_ADDRESS(dictionary0[i]);

I have been utterly unsuccessful in storing and reading out of the SEG 1 location which is denoted with a memory start location of 0x10000. However, I can put 20k of char*s at wherever the attr PROGMEM_FAR puts it. I have to go back to look at the linker script (avr6.x) to see where that's being mapped to. I know it's going into .data (or maybe just past it...)

At least I understand what's happening now. If I try to put anything into PROGMEM_SEG1 I not only get errors trying to read it back, but the bootloader gives me a verification error - which is probably the root cause of the whole problem. I don't have it in me to look into the bootloader at this point... On to another problem!

Cheers to all
Joe

Hi,

Just joined in and digged out a lot of useful information, thanks

I am working on a project using Mega 2560 for voice announcement, so I need to store a large amount of compressed voices in the program memory. 256kB of flash on Mega2560 can give me about 7 minutes of voice playback.
So I encountered the same problem you guys have - accessing data stored outside the 64kb boundary of program memory.
Thanks pYro_65 for the neat solution of declaring PROGMEN_FAR in the program memory outside first 64kB, that solves the download and crash problem. Using the pgm_read_byte_far() I can read data anywhere in the 256kb program memory.
The GET_FAR_ADDRESS macro for converting program memory address to uint_farptr_t is wonderful, and with all these, I can get my program work; all 7 minutes of voices can be played without problem, thanks.

However, in tidying up my program and orgainzing my voice data in an easy indexing manor, I encountered another minor problem!

I need to store the address of each voice segment into an index table so that I can load the address and read the data accordingly. Of course, the index table contains 32 bit long address. The problem I have is; the GET_FAR_ADDRESS() macro works only in run time when it is called. It can not work in compile time, so when I build up my index table, done in compile time, the compiler doesn't allow me to cast the voice segment name (far address) into uint_farptr_t type. My question is; anyone of you know how to set up a 32 bit pointer table that points to the far program memory in compile time?
Of course, I can still get my program works by writing a simple initialization rountine (run a sequence of GET_FAR_ADDRESS to iinitialize the table)
and get it run before my main code, but it will be a tidous job to type in all the voice segment names. I want the index table generated automatically in compile time. Any suggestion?

Thanks
Stan