I am thinking in the lines of function inlining or C++ this* preprocessing, where the compiler can get a hint (that being PROGMEM) and try to place the variable in flash and wrap accesses to that memory location with the right copying-to-RAM.
Sadly, the language specification doesn't allow you to do that. The "char *" that you pass to another function may be passed to some function that stores the string, and uses it some arbitrary time later. Thus, the "copy" in SRAM needs to be permanent. Thus, the compiler might as well put it there in the first place.
Regarding needing only a single byte when calling Serial.print(), that is true -- but many other standard functions that use strings need more of the space.
If the declaration of PROGMEM was detectable at compile time the way "const" or "volatile" is, then you could write overloaded functions for the cases where you're OK with a progmem variable. That would be cool. Unfortunately, the GCC compiler was not blessed with the ability to declare arbitrary custom "cv-quals" :-(
You can wrap these things into a class, and use that class as a char*, and have that class read the data into a scratchpad space when used automatically. Unfortunately, you then get into trouble when the program uses more than one (or two, or however much your max scratch space is).
The class would look something like:
class ProgMemString {
public:
ProgMemString(prog_char const *str) : str_(str) {}
prog_char const *str_;
operator char const *() {
strncpy_P(scratch, str_, 20);
scratch[20] = 0;
return scratch;
}
static char scratch[21];
};
You'd use it something like:
PROGMEM proc_char pm_myMsg[] = "Hello, World!";
ProgMemString myMsg(pm_myMsg);
void setup() {
...
Serial.print(myMsg);
}
Unfortunately, when more than one is used at the same time, they will fight for the scratch space. Each individual string can be at most 20 characters (in this case), too. And each ProgMemString in turn is 2 additional bytes or SRAM (sizeof(char *)) -- as long as you only need one at a time, that's probably a good trade-off, but long term, it's probably no simpler than the standard library form:
PROGMEM proc_char pm_myMsg[] = "Hello, World!";
char scratch[21];
void setup() {
...
Serial.println(strcpy_P(scratch, pm_myMsg));
}