Confused over operation of strlcpy, strlcpy_P & strlcpy_PF

I have the following code excerpt from a larger program:

loop() {
 const char     sOff[]         PROGMEM = "OFF";
 const char     sHeating[]     PROGMEM = "HEATING";
 const char     sFinished[]    PROGMEM = "FINISHED";
 const char     sDisabled[]    PROGMEM = "DISABLED";
 const char     sStartup[]     PROGMEM = "STARTUP";
 const char     sShutdown[]    PROGMEM = "SHUTDOWN";
  
 const uint8_t  MaxMessageSize = 125;
  
 static char     tString[MaxMessageSize];
 
 Serial.print (F("State: "));
 switch (State) {
  case Off:        strlcpy_P (tString, sOff, MaxMessageSize); break; 
  case Heating:    strlcpy_P (tString, sHeating, MaxMessageSize);  break; 
  case Finished:   strlcpy_P (tString, sFinished, MaxMessageSize);  break;
  case Disabled:   strlcpy_P (tString, sDisabled, MaxMessageSize);  break;
  case Shutdown:   strlcpy_P (tString, sShutdown, MaxMessageSize);  break;
  case Startup:    strlcpy_P (tString, sStartup, MaxMessageSize);  break;
 }
 Serial.print (tString);
 ...
}

I believe that the _P version should be used when we have a value in program space (PROGMEM). If we have a variable in program space using the F() operator we should use:

strlcpy_PF (tString, F("This is the text"), MaxMessageSize);

and this works elsewhere in the program fine.

When the above code is used I get output like:

**State: ⸮⸮xx⸮⸮ Curr Temp: 37.06 RTC: 01/01/2000 00:00:00 AM **

If I change the strlcpy_P to strlcpy then it works as shown below:

State: HEATING RTC: 01/01/2000 00:00:00 AM

This does not make sense as the source (sOff etc) is in program space. What is happening here ?

Geoffrey, NZ.

this works fine

const char     sHeating[]     PROGMEM = "HEATING";
const uint8_t  MaxMessageSize = 125;
char tString[MaxMessageSize];

void setup() {
  Serial.begin(115200);
  strlcpy_P (tString, sHeating, MaxMessageSize);
  Serial.println(tString);
}

void loop() {}

and this version does not

const uint8_t  MaxMessageSize = 125;
char tString[MaxMessageSize];

void setup() {
  const char     sHeating[]     PROGMEM = "HEATING";
  Serial.begin(115200);
  strlcpy_P (tString, sHeating, MaxMessageSize);
  Serial.println(tString);
}

void loop() {}

but this one does

const uint8_t  MaxMessageSize = 125;
char tString[MaxMessageSize];

void setup() {
  static const char     sHeating[]     PROGMEM = "HEATING";
  Serial.begin(115200);
  strlcpy_P (tString, sHeating, MaxMessageSize);
  Serial.println(tString);
}

void loop() {}

Notice the difference/hint to fix?

Note also as you have not set a default for your switch/case, if State is outside the list of identifiers tString will contain whatever last was written there

OK so if you define a const char array in PROGMEM and it is a global or a static within a function all is as expected. The static makes it sort of a global in terms of its existence but its scope (area where it can be accessed) is limited to function it is defined in. ALL IS WELL.

If however it is neither global or static then in effect the PROGMEM specification is silently dropped and it goes back to residing in RAM and the compiler says nothing. In effect I suspect everytime the loop function is called it performs a copy from program space to RAM of the char array and we are now dealing with a RAM only version. IS THAT RIGHT ?

A little nasty that the compiler is silent on this significant failure to do what the programmer intended.


It also means that the strlcpy_P does not enforce any sort of type checking to ensure that the second parameter is indeed of type PROGMEM which is another little nasty. It just takes what is in fact a RAM address and treats it as a program space address I guess with the attendant pain that can cause - output whatever is at the selected program space address until get to a byte containing 0 (null).


The bit about the default is already catered for in the real code with a strcpy_P before the switch statement. It was omitted for brevity in the example provided.

Many thanks for your answer J-M-L. Nasty little C compiler...

Geoffrey.

Given how complex C now is and C++ features on top of that I should have probably said:

"Nasty bloated C compiler nuances and forced programmer character development..."

I believe that the _P version should be used when we have a value in program space (PROGMEM). If we have a variable in program space using the F() operator we should use:

strlcpy_PF (tString, F("This is the text"), MaxMessageSize);

Your understanding of strlcpy_PF() is not correct. The strlcpy_P() function can only reference an address in PROGMEM that is within the first 64K of program memory, because it uses a 16-bit address pointer (as do most instructions on an avr processor). When you are using a processor with more than 64K of program memory, such as the atmega2560 on the Arduino Mega, to access PROGMEM above the 64K boundary it is necessary to use a larger address pointer, so the _PF is needed to tell the compiler to use a progmem_far address, which is 32-bits.

You cannot use the F() macro for the strcpy_P() or strcpy_PF() functions, F() is used for the print() and println() functions. You would instead use the PSTR() function.

strlcpy_P(tString, PSTR("This is the text"), MaxMessageSize);

There is also no need to copy the text from PROGMEM to a buffer before printing, you can print directly from PROGMEM by using a cast to __FlashStringHelper*, as an example:

  case Heating:    Serial.print((__FlashStringHelper*) sHeating);  break;

Although in your code, unless you are going to be using tString elsewhere in the code, it would be a lot simpler to use the F() macro and print within the switch/case statements, then the compiler will handle all the PROGMEM storage for you:

  case Heating:    Serial.print(F("HEATING"));  break;

A little nasty that the compiler is silent on this significant failure to do what the programmer intended.

where would be the fun otherwise :confused: :grin:

Intuitively though, you are giving conflicting orders to the compiler:

  • a local constant has limited scope and its allocated memory would disappear when the functions terminates
  • a PROGMEM variable is permanent, there is no way to free the memory it consumes as it's burnt in flash

==> the compiler has to decide what to do but you can't get both.

Depending on the platform, the PROGMEM keyword is something special.

On AVR it's a variable modifier handled by the compiler. It stores variables into a section whose name starts with .progmem, similar to a section attribute (and performs additional checks) but the way it is handled has been varying over time and GCC versions. Based on what you saw, we can assume that the compiler on an AVR architecture decided that reclaiming space (wherever it was allocated) was more important and thus ignored the PROGMEM keyword. adding static instructs the compiler that you want this to occupy memory all the time, so then it goes with proper flash storage.

On an ESP, this is implemented as a Macro that basically adds an attribute to the variable __attribute__((section(".irom.text"))). So it places the variable in the .irom.text section in flash and there is nothing more about it. As strlcpy_P() does not exist on ESP, you would use strncpy_P() and if you look at the code

const uint8_t  MaxMessageSize = 125;
char tString[MaxMessageSize];

void setup() {
  const char     sHeating[]     PROGMEM = "HEATING";
  Serial.begin(115200);
  strncpy_P (tString, sHeating, MaxMessageSize);
  Serial.println(tString);
}

void loop() {}

that is failing on an AVR (even with strncpy_P), it will just work fine on an ESP.

takeaway: PROGMEM stuff are meant to be persistant, make them global and don't think about it any more, it will work on other platforms too.

PS2: agree with the above, what's the point of using PROGMEM if you then duplicate one of the string into RAM for just printing.. either make tString a const  __FlashStringHelper* and memory the pointer to your flash memory or print directly as it was suggested.

Thanks for all the responses. I now understand that for a Mega2560 (which I use) that the strlcpy_PF needs to be used to be on the safe side with the possibility that the string is located above 64KB. If I use the strlcpy_PF with the PROGMEM string being global or static all is fine.

FURTHER INVESTIGATION
I also put the following into the program:

Serial.print (sizeof (sHeating));

and got back 4 covering off the detail provided by david_2018 - as he stated the address is 4 bytes (32 bits). I presume it does not report the actual string size because of the special nature of PROGMEM arrays.

I then printed the memory address using:

Serial.print ((uint32_t)&sHeating, HEX);

and got:

50C4

which was all good (string located in 16 bit address) as code was < 64KB even though 32 bits allocated for address.

USING FLASH MORE THAN 64KB
I thought that while I was at it I would test the scenario where the code is more than 64KB. Currently it is 57KB approx.

What I did was added a the following to the program:

const uint16_t GJB_Pic1[] PROGMEM = {
0x3394, 0x3374, 0x2b74, 0x2b94, 0x2b94, 0x2b94, 0x2b94, 0x2b94, 0x2b94, 0x2b94, 0x2b94, 0x2b94, 0x3394, 0x3394, 0x3394, 0x3394, 0x3394, 0x3394, 0x3394, 0x3394, 0x3395, 0x3395, 0x3395, 0x3395, 0x3395, 0x3395, 0x3395, 0x3395, 0x3395, 0x3395, 0x3395, 0x3395, 0x3394, 0x3394, 0x33b5, ... };

which was longer than shown above (for brevity) and I then used it in the program so that the compiler didn't have its optimizer remove the array:

tData = pgm_read_byte_far (&GJB_Pic1[50]);

The compiled size was then around 81KB - all as expected:

Sketch uses 80564 bytes (31%) of program storage space. Maximum is 253952 bytes.

I also then printed the global char array I have declared:

const char     sHeating[]     PROGMEM = "HEATING";

and the output on serial monitor now shows:

⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮

so once again broken. I then inserted the following code:

Serial.print ((uint32_t)&sHeating, HEX); Serial.print (F(" "));
Serial.print (sHeating);

FFFFA204 ⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮

so yes using an address that is above 64KB but it is not correct.

so attempted to search ALL of the PROGRAM SPACE with the following code:

byte tData;
uint32_t t;
        Serial.println (F("Searching for string in program space..."));
        char search[]="HEATING";
        for (t=256; t < 262144UL; t++) {
          tData=pgm_read_byte_far(t);
          if (tData == (byte)search[0]) {
            bool tabort=false;
            for (uint32_t p=1; (p < strlen(search)) && !tabort; p++) {
              tData=pgm_read_byte_far (t+p);
              if ((char)tData == search[p]) {
                tabort=false;
              } else {
                tabort=true;
              }
            }
            if (!tabort) {
              //We found it.
              Serial.print (F("String located at address: ")); Serial.println (t, HEX);
            }
          }
        }
        Serial.println();
        Serial.println (F("Search complete."));

and low and behold found the string at the following locations (serial monitor output):

Searching for string in program space...
String located at address: A200
String located at address: 13A56
Search complete.

The address 13A56 is above 64KB so is the actual address. I believe the A200 location will be the value assigned to the search char array (address of text in program space to initialize the string in data space).

There is however a new problem with dealing with code above 64KB - the actual address of the string in PROGMEM space is wrong when accessing the &sHeating const char.

Why is this happening ?

Geoffrey

and got back 4 covering off the detail provided by david_2018 - as he stated the address is 4 bytes (32 bits). I presume it does not report the actual string size because of the special nature of PROGMEM arrays.

you asked the size of a pointer, not the length of the cString (which would require strlen_PF() or strlen_P()).

You can read this discussion it's informative

Serial.print (sizeof (sHeating));

That should print 8, the number of bytes in the char array sHeating ("Heating" plus the terminating null). The size of the pointer to sHeating would be sizeof(&sHeading), and would be two bytes. The full 4-byte pointer would be the return value from pgm_get_far_address(sHeating).

I've never had any success indexing into an array in PROGMEM above the 64K boundary, the compiler seems to insist on doing the calculation with 16-bit pointers, so I've always resorted to doing the calculations myself. (sizeof(GJB_Pic1[0] is not needed here since each element of the array is 1 byte, but is important if each array element is larger).

//tData = pgm_read_byte_far (&GJB_Pic1[50]);
tData = pgm_read_byte_far(pgm_get_far_address(GJB_Pic1) + sizeof(GJB_Pic1[0]) * 50);

In reference to the discussion linked in the previous post, when using large amounts of PROGMEM, it is very, very important to tell the compiler to place your data at the end of the code instead of the default location, otherwise you can displace data that must be located in the first 64k segment of flash memory, the F() macro will cease to function, libraries that utilize PROGMEM will no longer work, etc.

david_2018:

Serial.print (sizeof (sHeating));

That should print 8, the number of bytes in the char array sHeating

in general yes, sizeof applied to an array is the number of bytes used for the array, but this case is a bit special with PROGMEM as the data does not live in RAM (Harvard architecture).

Intuitively you could see that as if you had defined const char * sHeating = "HEATING";that is, the type would be a pointer to some constant data stored’ somewhere else and thus when you ask for the size, you are asking the size of the type const char *.

(I can’t test at the moment so may be I’m wrong )

You do not want to use this style of declaration for storing text in PROGMEM

const char * sHeating = "HEATING";

The equivalent for PROGMEM would be the following, but will not give the desired results. The char * will be stored in PROGMEM, but it will be a pointer to ram, where the actual text will be stored.

const char * const sHeating PROGMEM = "HEATING";

This is how you store the actual text in PROGMEM, in which case sizeof(sHeating) will return the size of the char array, not the size of a pointer, same as for any other array name without brackets.

const char sHeating[] PROGMEM = "HEATING";
//pointer stored in RAM, text stored in RAM
const char * sHeating = "HEATING";

//pointer stored in PROGMEM, text stored in RAM
const char * const xsHeating PROGMEM = "HEATING";

//text stored in RAM
const char ysHeating[] = "HEATING";

//text stored in PROGMEM
const char zsHeating[] PROGMEM = "HEATING";

void setup(){
  Serial.begin(115200);
  Serial.println("startup");
  Serial.println(sizeof(sHeating));  //prints 2
  Serial.println(sizeof(xsHeating)); //prints 2
  Serial.println(sizeof(ysHeating)); //prints 8
  Serial.println(sizeof(zsHeating)); //prints 8
}

void loop(){
}

You do not want to use this style of declaration for storing text in PROGMEM

Sure. Sorry, I have not been clear. That was meant to be an example, explaining that depending on the type of the variable - even if you can print and see its content, you don't always get the size of the content by asking the size of that thing.

You are right about what it returns. I thought it might handle that differently but that was not well thought through. regardless of where it ends up, for the compiler it's just some memory allocated somewhere and it can see the data size at compile time for that type.

After having read all the information provided in earlier posts I took the link provided by J-M-L which allows data to be located in program memory in different "areas":

eg.

#define PROGMEM_LATE1 __attribute__ (( __section__(".fini1") ))
#define PROGMEM_LATE2 __attribute__ (( __section__(".fini2") ))

const char     sHeating[]     PROGMEM_LATE1 = "HEATING"; //Global.
const uint16_t GJB_Pic1[] PROGMEM_LATE2 = { 0x3394, 0x3374, 0x2b74 ... }; //Global

and then did the following in loop():

Serial.print (F("|SIZEOF (sHeating) ")); Serial.print (sizeof(sHeating)); 
        Serial.print (F("|ADDR (sHeating farptr) ")); Serial.print ((uint_farptr_t)&sHeating, HEX); 
        Serial.print (F("|ADDR (sHeating; pgm_get...) ")); Serial.print (pgm_get_far_address(sHeating), HEX); 
        Serial.print (F("|ADDR (GJB_Pic1 farptr) ")); Serial.print ((uint_farptr_t)&GJB_Pic1, HEX); 
        Serial.print (F("|ADDR (GJB_Pic1 pgm_get...) ")); Serial.print (pgm_get_far_address (GJB_Pic1), HEX); 
        Serial.print (F("|"));

and got the following output:

|SIZEOF (sHeating) 8|ADDR (sHeating farptr) 5F4A|ADDR (sHeating; pgm_get...) 15F4A|ADDR (GJB_Pic1 farptr) FFFFE80C|ADDR (GJB_Pic1 pgm_get...) E80C|

so this raises a few questions:

  1. sHeating is located in PROGMEM_LATE1 and the pgm_get_far_address (sHeating) shows 15F4A which means located above 64KB (good). The GJB_Pic1 was also stated as being located outside of the first 64KB with the PROGMEM_LATE2 specifier. The difference in behaviour here is odd to me. Firstly doing a type cast to uint_farptr_t sHeating only supplies the last 16 bits of address whereas GJB_Pic1 has leading F's.

  2. The second issue is that GJB_Pic1 is not located above 64KB even though the #define specifier (my terminology) is accepted by the compiler.

and last of all another issue arises:

  1. If sHeating is defined as PROGMEM_LATE1 then:
Serial.print (sHeating); //OR
Serial.print ((__FlashStringHelper*)sHeating); //OR
Serial.print ((__FlashStringHelper*)pgm_get_far_address(sHeating)); //OR
Serial.print ((const  __FlashStringHelper*)pgm_get_far_address(sHeating));

all fail and print out crap. The only way I can get it output is with the following:

static char     tString[MaxMessageSize]; //In loop();
strlcpy_PF (tString, pgm_get_far_address(sHeating), MaxMessageSize);
Serial.print (tString);

so have to copy it into RAM (tString) to enable its output to serial. Also need to use pgm_get_far_address () with strcpy_PF (far address > 64KB) for it to work.

In other words once strings are stored in program memory above the 64KB limit then things get a lot more complicated.

Any clarity on all of this appreciated...

Geoffrey.

Just to expand on item 3) above I used the following code:

Serial.print (F("__FlashStringHelper*: ")); Serial.println ((__FlashStringHelper*)sHeating);
Serial.print (F("__FlashStringHelper*(pgm_get...): ")); Serial.println ((__FlashStringHelper*)pgm_get_far_address(sHeating));
Serial.print (F("const __FlashStringHelper*: ")); Serial.println ((const  __FlashStringHelper*)pgm_get_far_address(sHeating));
Serial.print (F("nothing: ")); Serial.println (sHeating);

in loop() to print out the value of sHeating (global char array in PROGMEM_LATE1). The results are:

__FlashStringHelper*: ⸮/⸮2 ⸮⸮1⸮k | ⸮⸮⸮ ⸮ f⸮⸮⸮⸮⸮⸮ ⸮⸮⸮ ⸮⸮ ⸮⸮ ⸮ ⸮⸮⸮ ⸮⸮⸮ ⸮⸮⸮ ⸮m⸮⸮⸮ ⸮⸮⸮
__FlashStringHelper*(pgm_get...): ⸮/⸮2 ⸮⸮1⸮k | ⸮⸮⸮ ⸮ f⸮⸮⸮⸮⸮⸮ ⸮⸮⸮ ⸮⸮ ⸮⸮ ⸮ ⸮⸮⸮ ⸮⸮⸮ ⸮⸮⸮
const __FlashStringHelper*: ⸮/⸮2 ⸮⸮1⸮k | ⸮⸮⸮ ⸮ f⸮⸮⸮⸮⸮⸮ ⸮⸮⸮ ⸮⸮ ⸮⸮ ⸮ ⸮⸮⸮ ⸮⸮⸮ ⸮⸮⸮ ⸮m⸮⸮
nothing: ,⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮⸮

Geoffrey.

If I do my owning printing of character strings located at program addresses anywhere with:

void GJBspecialSerialPrint (uint_farptr_t pTheTextInPgmMemory) {
  char tText;
  tText = (char) pgm_read_byte_far (pTheTextInPgmMemory);
  while (tText) {
    //not the NULL char.
    Serial.print (tText);
    pTheTextInPgmMemory++;
    tText=(char)pgm_read_byte_far (pTheTextInPgmMemory);
  }
  return;
}

then all works. Can't understand why serial.print cannot do this when it can deal with F("...") and char arrays stored in first 64KB. It feels like I am missing something.

Not sure if I am onto something but checked out C:\Program Files (x86)\Arduino and found a possible answer:

There is a HardwareSerial.h & cpp that inherit a class called Stream from what I can see:

class HardwareSerial : public Stream
...

and this Stream.h appears to inherit Print.h class:

class Stream : public Print
{

and this has the following:

size_t Print::print(const __FlashStringHelper *ifsh)
{
  PGM_P p = reinterpret_cast<PGM_P>(ifsh);
  size_t  n = 0;
  while (1) {
    unsigned char c = pgm_read_byte(p++);
    if (c == 0) break;
    if (write(c)) n++;
    else break;
  }
  return n;
}

and I think the issue is that __FlashStringHelper * is recast as PGM_P and that this is a 16 bit value (so insufficient for going over the 64KB boundary). So the Serial.print is not enabled for use above 64KB. Also the fact it uses pgm_read_byte (and not pgm_read_byte_far) means it is not really Mega2560 enabled which is a bit crappy.

Am I on the correct path ?

Geoffrey.

The code in Print.h that handles __FlashStringHelper* assumes that the reference to flash memory is within the first 64K, a fairly safe assumption for most arduino boards, and an assumption that is made in most code that uses PROGMEM. The main reason for explicitly telling the compiler to store large arrays of data stored in PROGMEM in the upper section of flash memory is so that you do not break the functionality of other parts of the code where the lower 64K of flash memory must be used.

Your code is very similar to that used in Print.h, with a slight difference that you use Serial.print verses Serial.write to actually print the character, and the Print.h code returns the number of bytes written, something that is seldom actually used but should be done to maintain compatibility.

It should be possible to define something similar to __FlashStringHelper* to handle far addresses, but I doubt there has been enough need for anyone to spend time doing it, since in the few instances it is needed a custom function such as yours can be written.

You may run into other problems storing massive amounts of text strings above the 64K boundary, unless you intend to reference each explicitly. Storing an array of char* to the text strings runs into the problem of the compiler treating a pointer as a 16-bit value, and the compiler cannot resolve the 32-bit address at compile time either.

Thanks for that input david_2018. I guess my comment would be that within library the use of preprocessor commands is there to make the compiled code work on the different architectures. From what I can see the function that handles Serial.print (...) is reproduced below:

size_t Serial::print(const __FlashStringHelper *ifsh)
{
  PGM_P p = reinterpret_cast<PGM_P>(ifsh);
  size_t  n = 0;
  while (1) {
    unsigned char c = pgm_read_byte(p++);
    if (c == 0) break;
    if (write(c)) n++;
    else break;
  }
  return n;
}

I haven't been able to figure out how big __FlashStringHelper* is but assume for now it is 32 bits. If this is the case then the main issue appears to be the recasting to a PGM_P which is 16 bits (from pgmspace.h):

#define PGM_P const char *

what I would have thought would be possible (programming is not my main expertise) would be along the following lines:

size_t Serial::print(const __FlashStringHelper *ifsh)
{
  #ifdef (AVR_16bit_address)
  PGM_P p = reinterpret_cast<PGM_P>(ifsh);
  #else
  uint_farptr_t p = reintrepret_cast<uint_farptr_t>(ifsh);
  #endif

  size_t  n = 0;
  while (1) {
    #ifdef (AVR_16bit_address)
    unsigned char c = pgm_read_byte(p++);
    #else
    unsigned char c = pgm_read_byte_far(p++);
    #endif
    if (c == 0) break;
    if (write(c)) n++;
    else break;
  }
  return n;
}

Which would have then compiled the support in for the Mega2560 automatically. I know the AVR_16bit_address labels are probably incorrect but I just wanted to get the logic across.

It seems to me that the libraries should take into account large amounts of text being stored in FLASH as it would be extremely unlikely that there would be 256KB of machine code created from the program logic (on Mega2560). It is far more likely that FLASH would be used to store data tables etc. By not factoring into Serial.print etc the need to print text from anywhere in FLASH is short sighted in my view.

The current situation also creates yet another situation whereby having text in a location other than the first 64KB creates a failure scenario that the compiler does not pickup and silently passes into the realm of creating an unintended memory access from the wrong location.

Nasty...

Geoffrey.

__FlashStringHelper is declared in WString.h as

class __FlashStringHelper;
#define F(string_literal) (reinterpret_cast<const __FlashStringHelper *>(PSTR(string_literal)))

It is used to differentiate between a char* that points to an address in RAM and a char* that points to an address in flash memory.

The basic problem is that pointers are 16 bits, so are limited in range from 0 to 65535.

Creating the function to print the char array using 32-bit addressing is not the difficult part, getting the remainder of the code to work properly is. As long as you know the specific variable you want to pass to print, finding the far address is easy, but to store massive amounts of text in PROGMEM without wasting lots of memory the general technique is to store each string of text in a separate char array, then create an array of pointers to the char arrays. That array of char * is the problem - to create it at compile time requires that the compiler know the address of the char arrays, but that can only be determined at run-time. You also cannot pass a char* to print() and then have print() use the pgm_get_far_address() function to get the far address, because that char* argument is passed as a 16-bit value with no indication of its true far address.

What specifically are you doing that requires storing this much text?

I think that what I have summarized below, based on various feedback, is probably a nice summary of where things stand. I thought I would put it together for my own understanding as well as that of others. This applies to the AVR architecture processors (UNO, MEGA, LEONARDO, MINI, MICRO, LILLYPAD etc). This, for me, clarifies a lot of details:

1) ADDRESSES HAVE TWO POSSIBLE LOCATIONS
There are two types of memory on these processors – DATA and FLASH. DATA is R/W and therefore things can be changed and is what most would know as RAM. The FLASH is R/O (read only) memory and is associated with the use of F(..) macro and PSTR(…) use as well as defining data as being in PROGMEM:

const char GJB_stuff[] PROGMEM = “I am located in FLASH.”;

creates a character array in FLASH. The reference to PROGMEM refers to program memory which is mainly what it is for. I find this a bit deceiving and the keyword FLASHMEM would have been a better choice I think as FLASH is for program machine code and data as well.

2) FLASH REQUIRES USING CONST
When using FLASH the use of const normally is needed to specify to the compiler that the data will not change (read only – refer 1)). The following is therefore NOT valid:

char GJB_stuff[] PROGMEM = “I am located in FLASH.”;

3) F, PSTR & PROGMEM ALL RELATE TO FLASH
At the end of the day F(), PSTR() and PROGMEM are all constructs that do one thing – store read only data in FLASH. This can be seen in the defines used:

class __FlashStringHelper;
#define F(string_literal) (reinterpret_cast<const __FlashStringHelper *>(PSTR(string_literal)))

so F() makes use PSTR() and PSTR() makes use of PROGMEM:

define PSTR(s) (extension({static const char __c[] PROGMEM = (s); &__c[0];}))

so using PSTR(“I am in FLASH”) … would translate to the following:

static const char __c[] PROGMEM = “I am in FLASH”;

with the address of __c being provided back from PSTR().

4) SEPARATION BETWEEN RAM AND FLASH VIA DATATYPES TO ALLOW FUNCTION OVERLOADING
Because of 1) – data can be located in RAM or FLASH – this has meant that the datatype “__FlashStringHelper” has been utilized with function overloading to allow parameters to be passed either referencing RAM or FLASH space and the differentiation being made clear as to which function call is used. The datatype __FlashStringHelper identifies FLASH storage.

5) DATATYPE NAMING FOR FLASH ADDRESSES
The use of the datatype name __FlashStringHelper is, in my view, a bit nasty. First of all there is a datatype called String (refer String.h) which can use dynamic memory allocation to store text. The use of “String” in __FlashStringHelper is not making reference to this String datatype but is making reference to a character array, ie:

const char GJB_stuff[] PROGMEM = “I am in FLASH.”;

GJB_stuff is a character array and the datatype reference to “String” in __FlashStringHelper is referring to this type of TEXT. Last of all it is not what I would call a “Helper” as it is merely a way of identifying a FLASH memory address. If I was to therefore name it it would be __FlashArrayAddr. This would deal with the use of FLASH memory for storing text data.

6) USING FUNCTION OVERLOADING TO HANDLE RAM AND FLASH ADDRESS POINTERS
The overloading looks something like the following in the various header files:

size_t print(const __FlashStringHelper *); //I deal with FLASH addresses. (in Print.h header file)
size_t print(const char[]); //I deal with RAM addresses.

so this simplifies programming by allowing Serial.print (…) to handle RAM as well as FLASH addresses pointing to text data. When using Serial.print (...) with FLASH text we have three scenarios under which the FLASH text was created:

a) If the F() macro was used this type casts the FLASH data to being of type __FlashStringHelper so Serial.print works unmodified. ie. Serial.print (F("Im in FLASH"));

b) For PROGMEM the text is in flash but it is considered to be a char * (16 bit address) value. For use with Serial.print we would need to recast. ie. Serial.print ((const __FlashStringHelper *) GJB_Stuff); OR Serial.print (reintrepret_cast<const __FlashStringHelper *> GJB_Stuff);

c) For PSTR the same issue as b) arises - it needs to be recast for function overloading in Serial.print to work, for example.

[...continued below...]