Ok. Progress after many hours of bashing. I promised to report back, and I am.
Consider this my engineering notebook of sorts. I'm not trying to instruct you experts, but rather, just recording my experiences. I know I do not have a complete understanding yet but this is as far as I got. I need more info - particularly on memory sections and addressing. But Maybe this will be valuable to someone.
My application is that I am putting a dictionary of 10k words into an Arduino Mega2560 and accessing it through some means for display on an LCD. My IDE is (mostly) the Arduino 1.0.3, and I'm primarily using MacOSX v10.6. What I eventually got to work was on that system.
I also tried compile/link/load sequence from the command line using avr-gcc 4.6.2 and 4.7.2. I tried the AVR environment on Eclipse and also on Mac XCode. Results varied in different ways with all those methods. In all cases I'm using the most version of avrdude (I forget which) that comes with 1.0.3, and also which is obtained when you fetch it on MacOSX with macports or fink. It's the same version in all cases.
I'm pretty confident that the combination of 4.7.2 and XCodeAVR or EclipseAVR would give me an entirely different experience than the ArduinoIDE. However I didn't do more than blinky-with-big-global-vars on any of those because I didn't have the time/patience to go back and recast all my Arduino code in native AVR speak. However I was very successful in compiling and linking to a .hex that contained flash/global data structures larger than 64k. I did not attempt to access those, though.
Things learned:
There is a real 64k boundary for data today. Note that code is not as limited. The vector long jump table (which I guess is called a "trampoline table" in AVR speak) consists of 32 bit words and so could easily address much more than the 256k in the Mega. However, data access is limited by 16-bit pointer addressing, so the largest chunk is 64k. Now, there is a great exception to this. By concatenating bits from another register (RAMPZ) you can effectively create a 24bit pointer for data, and this is done, partially in some asm code available in multiple libraries out there. At the moment, on the version of the IDE we have, though you can specify multiple 64k byte clumps, you can't cross that boundary on any one array or data structure.
Future versions of avr-gcc seem to address this. Particularly, it appears that in 4.7.2 the compiler/linker is happy with larger-than-64k global defs. And let me say here - it's global defs that are not-changing that we are talking about at all times, because it goes into flash. If you say"
volatile int xyz = 123;
That gets put into RAM and is globally defined for all your code/ISRs. This is not what I'm saying. If I say
volatile int xyz PROGMEM_blahblah = 123;
The "volatile" piece is not particularly useful, far as I can tell. By specifying PROGMEM_... you're putting the data into flash, and as such, it has to be globally accessible because you can't put PROGMEM data into local vars. And as it's not changable this would be just as effective and apparently does exactly the same thing:
int xyz PROGMEM_blahblah = 123;
The problem with getting things to compile when you've got more than 64k in one chunk (and it will compile under certain circumstances) is that the rest of the system has no clue how to handle it because the ptr to the globals is only 16bits and theres nothing upon nothing you can do about that. So you wind up with linker errors, and errors that show up in other places - in code you had nothing to do with. This is a sign of badness, and the signal that you need to accept your 64k limitation with happiness and move on. Because it will be solved at some point.
In the version of avr-gcc available in the Ar1.0.3 (I believe it's 4.3.2) distribution, the compiler balks at "certain" declarations/structures which when initialized reach over 64k. And this is a key point - most of the problems surface in the "initialization" of defined vars, and not in the definition itself.
By the way, you can happily define/allocate empty vars to your hearts content. This is a red herring, though, I have found. In a lot of the tests run here, and also ones I've tried, you can play various games to get the compiler to NOT optimize away unused/uninitialized space. But the results are variable.
First, let me indicate we are talking about data in Flash. For all intent and purposes, data in Flash may as well be data in a ROM. Yes, I know there are ways to modify it during runtime, but that goes beyond what I have tried here.
As the flash data is essentially in a ROM, it is not unreasonable to expect that you would know what it was apriori. That is - you're going to burn it into flash, so you're certain what the data is. Therefore, all vars/data going into flash are known ahead of time and the compilers/linkers presume as much. This is an important point, and also one which is the source of a lot of pain. You can do the following in your code:
const char abc[] PROGMEM_blahBlah = "abc"; // PROGMEM_blahblah to be explained
And the abc will become abc[4] after compilation - that is, three chars 'a','b','c' and a trailing null '\0'. Please note that the compilers presume that declaration/initialization is intended to define a string. You don't get to come back later and say - hey, there's only 3 chars I didn't mean for it to be a null-terminated string. Too bad. The compiler is helping you. If you want a 3 char array you say:
const char abc[3] PROGMEM_blahBlah = 'a','b','c';
and you get that. But in the determination of your memory usage you may be thrown off when the compiler tries to help you by adding another byte. In addition, when you're doing pointer arithmetic, you can't (always) address things in various parts of flash. For instance, depending on how you organized things this gives you garbage:
int aa[3] PROGMEM_yak= 1,2,3;
int fetchedInt;
for(int x=0;x,3;x++){
fetchedInt = pgm_read_word_far(&aa[x]);
printf("%d=%d\n",x,fetchedInt);
}
where this will work
int aa[3] PROGMEM_yak= 1,2,3;
int a,b,c;
a = pgm_read_word(&aa[1]);
b = pgm_read_word(&aa[2]);
c = pgm_read_word(&aa[3]);
printf("a=%d,b=%d,c=%d\n",a,b,c);
Now in my app, I am defining 10,000 words in a way I can access them. I have tried several methods. I have tried putting them in a struct like this:
typedef struct {
const char a[2];
const char aardvark[9];
//... etc 10k words
} words;
const words dictionary PROGMEM_blahblah = {
{"a"}, {"aardvark"},//etc, 10000 initializers};
And that will compile just with the current version of avr-gcc. However, it generates an error on versions 4.6.2 and 4.7.2 that say something to the effect of "internal error: report a bug ..."
Doing this
typedef struct {
const char a[2];
const char aardvark[9];
//... etc 10k words
} words;
const words dictionary PROGMEM_blahblah = {
.a="a",
.aardvark="aardvark",
//etc, 10000 initializers};
Will not compile on the current Arduino IDE but will compile on version 4.6.2 and v4.7.2.
I inevitably settled on a different means - and this isn't entirely debugged. I'm still having trouble with the flash memory sections, but I did the following:
const char apple[6] PROGMEM_yadda="apple";
const char bear[5] PROGMEM_yadda="bear";
// etc. etc. etc. 10000 words
const char* dictionary[10000] PROGMEM_yadda+1 ={apple, bear, //etc etc
The idea is that 64k of dictionary data is in a chunk of flash designated by the attribute "PROGMEM_yadda" and an array of pointers to that data is in a chunk of memory called "PROGMEM_yadda+1" (it may be obvious to most - but please don't try to create code with PROGMEM_yadda... that's just an example ) That compiles and links peachily.
Now - PROGMEM what I learned about PROGMEM is that as an attribute it specifies putting data into flash. Using PROGMEM is a multi-step process. You have to put it into your code, explicitly. But you also have to change the linker script to understand what you mean by that attribute. And then retrieving data stored via PROGMEM requires using accessors of the form
pgm_read_word_far(&myArray[i]);
...continued