Current situation with PROGMEM etc.

My apologies if this is a "frequently answered question", but so far I can't find recent authoritative discussion.

As of IDE 1.8.12, and assuming something like a Mega2560 as the target, is there any advantage to messing around with PROGMEM and F() for simple output like below?

  #ifdef F_CPU
    Serial.print(" ");
    Serial.print(F_CPU / 1000000.0, 2);
    Serial.print(" MHz");
  #endif
  Serial.println();

How does PROGMEM work with "stringize helpers" used to expand something like DATE?

I'm looking at a comparatively small amount of version info displayed at the start of a sketch run, but am assuming that the same principles apply to e.g. Serial.print() when it's outputting a menu with which the user will interact.

MarkMLl

MarkMLl:
As of IDE 1.8.12, and assuming something like a Mega2560 as the target, is there any advantage to messing around with PROGMEM and F() for simple output like below?

That would depend on how short you are of memory ?

Easy enough to test the impact of F() in the circumstances above on the amount of memory you have free.

I know it's easy to test, I'm asking what "best practice" is considered to be since I'm of the opinion that it's easier to plan for things like this early on rather than trying to fix stuff when a problem becomes apparent.

What does GCC (etc.) actually do here? Obviously the literal in something like

Serial.print("Hello World");

starts off in Flash, but is it copied into SRAM when the program starts and left there or is it copied temporarily to the heap or stack during execution of the print() (with F() eliminating that heap/stack overhead)?

MarkMLl

Heap and stack are parts of SRAM. If you use F(), the strings are stored in flash from where they have to be loaded every time they are used - this causes less SRAM usage and slower execution. If you are not using F(), the strings stay in SRAM and are not moved - this uses more SRAM and executes faster.

If you are not running low on SRAM there is no point in using F().

EDIT: F() means storing in PROGMEM / Flash.

MarkMLl:
I know it's easy to test, I'm asking what "best practice" is considered to be since I'm of the opinion that it's easier to plan for things like this early on rather than trying to fix stuff when a problem becomes apparent.

I always (well, mostly) use the F-Makro especially on the UNO. It's easy when you start to write a sketch. It's a pain in the a.. if you have to adopt your code cause you are running out of memory.
Till now I encountered seldom cases where I need this last bit of performance. Nevertheless - in this seldom cases it's easier to revert some of the F-Markos when this last piece of performance is really needed, than to it vice versa.

Danois90:
Heap and stack are parts of SRAM. If you use F(), the strings are stored in flash from where they have to be loaded every time they are used - this causes less SRAM usage and slower execution.

Time overhead is negligible compared with the speed of a serial line.

But still: what's GCC actually doing? Are literal strings copied to SRAM and left there occupying precious space, or are they left in Flash until print() is actually called and then transferred to the heap or stack?

MarkMLl

(Who's spent 35 years selling/maintaining/supporting compilers).

Literals are copied to RAM during crt0 unless you use the F() macro or explicitly specify otherwise.

I would recommend to use the preprocessor:

//Uncomment the following line to use PROGMEM
//#define USE_PROGMEM

#ifdef USE_PROGMEM
#define FF(X) F(X)
#else
#define FF(X) X
#endif

void setup()
{
  Serial.begin(9600);
  Serial.print(FF("Should we use PROGMEM?"));
}

void loop()
{
}

By making the code like this it is easy to switch between SRAM and PROGMEM by modifying a single line in the code.

MarkMLl:
But still: what's GCC actually doing?

When using F() the data remains in flash from where it is read byte-by-byte and used as such every time it is used. The data is never copied entirely to SRAM, unless your code does that - which would be counter-productive.

TheMemberFormerlyKnownAsAWOL:
Literals are copied to RAM during crt0 unless you use the F() macro or explicitly specify otherwise.

Thanks for that, exactly what I needed to know. So it's worth working out how to get something like FILE (which could be arbitrarily large) into PROGMEM.

MarkMLl

Danois90:
By making the code like this it is easy to switch between SRAM and PROGMEM by modifying a single line in the code.

I agree, but it needs careful consideration when something like FILE is being stringized.

Xref for context to earlier discussion at Determining the board type being used, revisited - Programming Questions - Arduino Forum

MarkMLl

there are interesting results if you play around with serial speed:

void setup() {
  Serial.begin(500000);
  Serial.println(F("\nStart"));

  uint32_t start, end;
  int32_t one, two;

  start = micros();
  Serial.println("a123456789b123456789c123456789");
  end = micros();
  one = end - start;
  Serial.println("standard"); Serial.println(one);
  
  start = micros();
  Serial.println(F("a123456789b123456789c123456789"));
  end = micros();
  two = end - start;
  Serial.println(F("F-macro")); 
  Serial.println(two);
  Serial.println(two-one);
}

void loop() {

}

@500.000
13:49:52.534 -> Start
13:49:52.534 -> a123456789b123456789c123456789
13:49:52.534 -> standard
13:49:52.534 -> 380
13:49:52.534 -> a123456789b123456789c123456789
13:49:52.534 -> F-macro
13:49:52.534 -> 360
13:49:52.534 -> -20

@115200
13:52:08.089 -> Start
13:52:08.089 -> a123456789b123456789c123456789
13:52:08.089 -> standard
13:52:08.089 -> 256
13:52:08.089 -> a123456789b123456789c123456789
13:52:08.089 -> F-macro
13:52:08.089 -> 1308
13:52:08.089 -> 1052

noiasca:
there are interesting results if you play around with serial speed:

You cannot create a benchmark like that: 1) You cannot control how the compiler optimizes the code - statements may be executed in a different order as written in the code, and 2) Serial communication is buffered and when the buffer is full, execution blocks until the interrupt has transmittet enough data to make space for new data in the buffer.

Nice try, though! :wink:

noiasca:
there are interesting results if you play around with serial speed:

Hmm. At the very least I'd suggest that you need to time how long it is before the buffer is completely flushed to the line i.e. I think you need something like

while (SERIAL_TX_BUFFER_SIZE - Serial.availableForWrite() > 1) ;

but that also raises the thorny question of whether the ISR is going to Flash and if so what the implications are.

MarkMLl

Try it again, with flush

TheMemberFormerlyKnownAsAWOL:
Try it again, with flush

And volatile to prevent optimization.

You cannot create a benchmark like that

I can. I did. :slight_smile:

Someone brought in "performance" against F-Makro.
I just would like to see negative effects of the F-Makro which might matter.

Why should I prevent optimization in this special circumstance?
Optimization will also be done in another sketch.

does this variant stress the 64byte buffer?

#pragma GCC optimize "O0"
void setup() {
  Serial.begin(115200);
  Serial.println(F("\nStart"));

  uint32_t start, end;
  int32_t one, two;

  start = micros();
  Serial.println("a123456789b123456789c123456789d123456789e123456789f123456789ABC");
  Serial.println("ABCf123456789e123456789d123456789c123456789b123456789a123456789");
  Serial.println("a123456789b123456789c123456789d123456789e123456789f123456789ABC");
  Serial.flush();
  while (SERIAL_TX_BUFFER_SIZE - Serial.availableForWrite() > 1) ;
  end = micros();
  one = end - start;
  Serial.println("standard"); Serial.println(one);

  while (SERIAL_TX_BUFFER_SIZE - Serial.availableForWrite() > 1) ;
  delay(1000);
  
  start = micros();
  Serial.println(F("a123456789b123456789c123456789d123456789e123456789f123456789ABC"));
  Serial.println(F("ABCf123456789e123456789d123456789c123456789b123456789a123456789"));
  Serial.println(F("a123456789b123456789c123456789d123456789e123456789f123456789ABC"));
  Serial.flush();
  while (SERIAL_TX_BUFFER_SIZE - Serial.availableForWrite() > 1) ;
  end = micros();
  two = end - start;
  Serial.println(F("F-macro")); 
  Serial.println(two);
  Serial.println(two-one);
}

void loop() {

}

14:34:30.109 -> Start
14:34:30.109 -> a123456789b123456789c123456789d123456789e123456789f123456789ABC
14:34:30.109 -> ABCf123456789e123456789d123456789c123456789b123456789a123456789
14:34:30.109 -> a123456789b123456789c123456789d123456789e123456789f123456789ABC
14:34:30.109 -> standard
14:34:30.109 -> 17200
14:34:31.138 -> a123456789b123456789c123456789d123456789e123456789f123456789ABC
14:34:31.138 -> ABCf123456789e123456789d123456789c123456789b123456789a123456789
14:34:31.138 -> a123456789b123456789c123456789d123456789e123456789f123456789ABC
14:34:31.138 -> F-macro
14:34:31.138 -> 16588
14:34:31.138 -> -612

TheMemberFormerlyKnownAsAWOL:
Literals are copied to RAM during crt0 unless you use the F() macro or explicitly specify otherwise.

Slightly off topic since it can never happen with an array, but I believe single byte literals can be stored in program memory and accessed with the LPM instruction. I would expect the compiler to use that ability whenever it can. The manual says of LPM, "This instruction features a 100% space effective constant initialization or constant data fetch". It takes 3 instruction cycles.

Untested, but I think:

const byte foo = 42;
...
byte bar;
bar = foo;

Never needs to place 'foo' in data memory space.

noiasca:
Why should I prevent optimization in this special circumstance?
Optimization will also be done in another sketch.

If you want to benchmark how long time a certain set of instructions take to complete, you cannot allow the compiler to mix and match - this will produce bad results.

Doing more work cannot take less time, so your benchmark is still flawed.

One thing you need to watch out for when using F(), when not using F() the compiler will look for identical string literals (such as printing the same text to an LCD and the serial monitor) and only store a single copy of the string in RAM, but does not do this when using F(), therefore you can end up wasting flash memory with multiple copies of the same string. Easy enough to get around this by storing the string in a char array in PROGMEM, then printing the array, when you know to watch for it.

I always (well, mostly) use the F-Makro especially on the UNO. It's easy when you start to write a sketch. It's a pain in the a.. if you have to adopt your code cause you are running out of memory.

An editor that uses regular expressions makes it much easier to find all the print/println statements and insert the F() macro around the string literal.

david_2018:
One thing you need to watch out for when using F(), when not using F() the compiler will look for identical string literals (such as printing the same text to an LCD and the serial monitor) and only store a single copy of the string in RAM, but does not do this when using F(), therefore you can end up wasting flash memory with multiple copies of the same string. Easy enough to get around this by storing the string in a char array in PROGMEM, then printing the array, when you know to watch for it.

Thanks for that reminder.

With reference to Determining the board type being used, revisited - Programming Questions - Arduino Forum for context, at the moment I'm doing something like

void board_info(char *sketch) {

  Serial.print("Project: ");
  Serial.println(sketch);
#if (defined (ARDUINO_BOARD_NAME) || defined (ARDUINO_BOARD_TEXT) || defined (F_CPU))
  Serial.print("Target:");
  #ifdef ARDUINO_BOARD_NAME
    Serial.print(" ");
    #define STRINGIZE_HELPER(x) #x
    #define STRINGIZE(x) STRINGIZE_HELPER(x)
    Serial.print(STRINGIZE(ARDUINO_BOARD_NAME));
...

which I think could usefully be changed to something like

#define STRINGIZE_HELPER(x) #x
#define STRINGIZE(x) STRINGIZE_HELPER(x)

const char zz_arduino_board_name PROGMEM = STRINGIZE(ARDUINO_BOARD_NAME);

void board_info(char *sketch) {
...
    Serial.print(F(zz_arduino_board_name));
...

where ARDUINO_BOARD_NAME etc. come from the command line and ultimately from platform.local.txt.

I /think/ that's where the F() needs to be applied, but would appreciate any corrections.

That particular code fragment is single-shot code in the setup() function, so considerations of execution speed etc. aren't really relevant.

MarkMLl