Investigating SRAM and PROGMEM

As our project is running out of memory I'm finally trying to understand memory mgmt in our Arduino Mega2560.

And the first tests I've created has got me puzzled...

I've created a simple struct definition and declare a couple of them and then check the free memory with the MemoryFree library. To my astonishment the amount of freemem is the same wether I declare 1 or 4 of those struct, inside or outside a method, const or not.

So this code result in 7847:

#include "MemoryFree.h"
struct MyStruct
{
  int ii;
  char c[10];
};
void setup()
{
  MyStruct ms = { 11, "123456789" };
  //MyStruct ms2 = { 10, "234567890" };
  //MyStruct ms3 = { 13, "234567890" };
  //MyStruct ms4 = { 12, "234567890" };

  Serial.begin(115200);
  Serial.println(freeMemory());
}

void loop()
{
  /* add main program code here */
}

And this code runs with the same result:

#include "MemoryFree.h"
struct MyStruct
{
  int ii;
  char c[10];
};

void setup()
{
  MyStruct ms = { 11, "123456789" };
  MyStruct ms2 = { 10, "234567890" };
  MyStruct ms3 = { 13, "234567890" };
  MyStruct ms4 = { 12, "234567890" };

  Serial.begin(115200);
  Serial.println(freeMemory());
}

void loop()
{
  /* add main program code here */
 }

Can somebody explain this?

Yes, that's because the compiler out-smarts you. It sees you never use the variable and so just leaves it out :wink:

And what the hell did you make the you run out of memory on a Mega :o

I would guess that since they were never used, they were never created. The compiler cannot be fooled like this. You have to fool it by making them volatile.

Ok, good point,, so now i use them in the loop by printing the ii value of every struct instance...

The following strange phenomenon occurs:

If I declare 7 constructs, the freemem reported is 7860
if I declare 1 construct, the freemem reported is 7848

...

guess I'll just create 1 million structs and then i'll have 2Mb free mem :slight_smile:

My "ultimate" goal is to get those structs stored in PROGMEM, but when I add the PROGMEM to the declaratio, nothing changes in the freemem amount. Any tips on that too?

You also need to understand how the stack works.

Mark

A small clearification on my previous post, I moved the declarations outside the setup() method, so if I understand correctly they are located in the heap now.

I know how the stack works, but that does not explain why freememory() would report more free memory when I declare more data. Wether they are located in the heap or the stack the amount of free memory should decrease.

bascy:
Ok, good point,, so now i use them in the loop by printing the ii value of every struct instance...

The following strange phenomenon occurs:

If I declare 7 constructs, the freemem reported is 7860
if I declare 1 construct, the freemem reported is 7848

...

guess I'll just create 1 million structs and then i'll have 2Mb free mem :slight_smile:

My "ultimate" goal is to get those structs stored in PROGMEM, but when I add the PROGMEM to the declaratio, nothing changes in the freemem amount. Any tips on that too?

Post updated code.

A small clearification on my previous post, I moved the declarations outside the setup() method, so if I understand correctly they are located in the heap now.

No they are NOT on the heap.

Mark

"the heap" is things that are allocated with malloc() or "new"

RAM layout looks like:

.data segment: Initialized global data, including "static" variables of functions.

  MyStruct ms = { 11, "123456789" };
:
void loop() {
static MyStruct ms2 = { 12, "987654421" };
   :
}

.bss segment: un-initialized (at compile time) global variables and static function variables:

  MyStruct ms2;
void loop() {
  static unsigned long lasttime=millis();   // NOT initialized at compile time!

HEAP: memory available (or used) for dynamically created data (malloc(), new, sbrk(), etc.) Starts at the end of .bss and grows upward toward the end of memory.

MyStruct *ms4;
void loop() {
   ms4 = malloc(sizeof(MyStruct));

STACK: Used for non-static local variables of functions, saving registers and other temporary values, return addresses, sometimes for function arguments. Starts at the end of RAM and grows backwards, toward the heap.

  void somefunc()
    MyStruct ms6 = { 11, "123456789" };   // ms6 is probably on the stack.
    MyStruct *ms7 = malloc(sizeof(MyStruct));  // the ms7 pointer is probably on the stack; ms7 itself
                                               // is on the heap.

malloc() has 'some' protection against running into the stack, but the stack generally has no protection at all against running into the heap. When the collide, bad and unpredictable things will happen! If you never use dynamic allocation, your heap size can be zero, in which case your stack can run into .bss first...
There are some "probably"s above; that's because the compiler will TRY to keep/pass function variables in registers instead of putting them on the stack, as this is both faster and uses less memory.

sterretje:
Post updated code.

Fair request :slight_smile:

#include "MemoryFree.h"

typedef struct MyStruct
{
  int ii;
  char c[10];
};

const MyStruct ms  = { 11, "123456789" };
const MyStruct ms2 = { 10, "234567890" };
const MyStruct ms3 = { 13, "234567890" };
const MyStruct ms4 = { 12, "234567890" };
const MyStruct ms5 = { 12, "234567890" };
const MyStruct ms6 = { 12, "234567890" };
const MyStruct ms7 = { 17, "234567890" };

void setup()
{
  Serial.begin(115200);
  Serial.println(freeMemory());
}

void loop()
{
  delay(1000);
  Serial.println(String(ms.ii));
  Serial.println(String(ms2.ii));
  Serial.println(String(ms3.ii));
  Serial.println(String(ms4.ii));
  Serial.println(String(ms5.ii));
  Serial.println(String(ms6.ii));
  Serial.println(String(ms7.ii));
  Serial.print("Freemem: ");
  Serial.println(freeMemory());
}

So this program reports a higher number for freeMemory() when all the 7 structs are declared. If I remove 6 of the declarations freeMemory() reports a lower number.

@westfw Thanks for the clarification,

On an Uno, I get 1832 bytes free with all things present, and 1820 bytes free with only one structure. So: observation duplicated!

I suspect you’re still running into “unexpected compiler optimization.” Let’s see - we can examine the binary with various tools that are included in the arduino download.
First, we can look at the compile time memory allocation with the “nm” utility:

All structures:

BillW-MacOSX-2<10004> avr-nm -nSC *elf | grep " [bBdD] "
00800100 D __data_start
00800100 00000002 D __malloc_heap_end
00800102 00000002 D __malloc_heap_start
00800104 00000002 D __malloc_margin
00800106 00000010 d vtable for HardwareSerial
00800124 B __bss_start
00800124 D __data_end
00800124 D _edata
00800124 00000001 b timer0_fract
00800125 00000004 b timer0_millis
00800129 00000004 b timer0_overflow_count
0080012d 0000009d b Serial
008001ca 00000002 B __brkval
008001cc 00000002 B __flp


Only one Structure:

BillW-MacOSX-2<10005> avr-nm -nSC *elf | grep " [bBdD] "
00800100 D __data_start
00800100 00000002 D __malloc_heap_end
00800102 00000002 D __malloc_heap_start
00800104 00000002 D __malloc_margin
00800106 00000010 d vtable for HardwareSerial
00800124 B __bss_start
00800124 D __data_end
00800124 D _edata
00800124 00000001 b timer0_fract
00800125 00000004 b timer0_millis
00800129 00000004 b timer0_overflow_count
0080012d 0000009d b Serial
008001ca 00000002 B __brkval
008001cc 00000002 B __flp

they’re identical. Also, there isn’t any mention of any of the ms* data structures in either case! So, the compiler seems to be optimizing away the structures. We can confirm by looking at the code produced with avr-objdump. Here’s the code where it converts ms.ii to a string, from the Single Structure version:

       return __itoa_ncheck (__val, __s, __radix);
 6ee:   4a e0           ldi     r20, 0x0A       ; 10    (RADIX)
 6f0:   b8 01           movw    r22, r16
 6f2:   8b e0           ldi     r24, 0x0B       ; 11   (constant 11)
 6f4:   90 e0           ldi     r25, 0x00       ; 0
 6f6:   0e 94 1e 06     call    0xc3c   ; 0xc3c <__itoa_ncheck>
}
#endif

String & String::operator = (const char *cstr)
{
        if (cstr) copy(cstr, strlen(cstr));

So it has managed to figure out that ms.ii is 11, and it doesn’t need the reset of the structure.

So why is there more free memory in the version with more than one structure?
One of the things I noticed during the above analysis is that the compiler has gone and converted a lot of the function calls to blocks of inline code. After all, they’re only used once, so it’s smaller to leave out the call instruction (and perhaps some argument handling). Functions like freeMemory() (which is short, anyway), and the String functions (including the internal String functions like String::reserve()) just don’t wind up in the final binary as functions:

BillW-MacOSX-2<10016> avr-nm -nSC *elf | grep freeMemory
BillW-MacOSX-2<10017> avr-nm -nSC *elf | grep println
0000031a 00000104 t Print::println(int, int) [clone .constprop.8]   (println does show up.  Big, used twice.)
BillW-MacOSX-2<10018> avr-nm -nSC *elf | grep String
BillW-MacOSX-2<10019>

Now, this turns out to be particularly interesting because the freeMemory function allocates a temporary value on the stack - I don’t think it’s going to work right if it gets inlined; its result will be based on the stack frame where the temp actually gets allocated, rather that the stack frame at the time function is called.
There’s a “noinline” attribute that you can use; I applied it to freeMemory(), and indeed the values reported do change! They’re still bigger for the case with multiple structures, though…

At this point… I’m sorta bored; presumably the “many” example uses less memory because when the “single” code inlines more functions, it needs more stack space for the local variables used by those functions, which all get combined, but going through the code to prove that seems more trouble than it’s worth. Can’t we leave it at “trying to judge allocation behavior from very small programs is difficult”?

Here’s a version crafted to make sure that the ms structures are actually allocated. Uses all of them, dumps all of all of them to a “volatile” port that the compiler MUST do… It behaves as expected - uses more RAM when there are more structures:

#include "MemoryFree.h"

typedef struct MyStruct
{
  int ii;
  char c[10];
};

#define ONLY1 1

const MyStruct ms  = { 11, "123456789" };
#if ONLY1 == 0
const MyStruct ms2 = { 10, "234567890" };
const MyStruct ms3 = { 13, "234567890" };
const MyStruct ms4 = { 12, "234567890" };
const MyStruct ms5 = { 12, "234567890" };
const MyStruct ms6 = { 12, "234567890" };
const MyStruct ms7 = { 17, "234567890" };
#endif
void setup()
{
  Serial.begin(115200);
  Serial.println(freeMemory());
}

void loop()
{
  delay(1000);
  dumpstr(&ms);
#if ONLY1 == 0
  dumpstr(&ms2);
  dumpstr(&ms3);
  dumpstr(&ms4);
  dumpstr(&ms5);
  dumpstr(&ms6);
  dumpstr(&ms7);
#endif
  Serial.print("Freemem: ");
  Serial.println(freeMemory());
}

void dumpstr(MyStruct *m)
{
  PORTB = m->ii;
  for (byte i = 0; i < sizeof m->c; i++) {
    PORTB = m->c[i];
  }
}

If I remove 6 of the declarations freeMemory() reports a lower number.

The compiler is NOT optimize for memory usage (if you think it is provide a (cite/ref) if you check the call of gcc you will see that it is setup to optimize for speed - cpu usage NOT memory usage.

Mark

The compiler is NOT optimize for memory usage (if you think it is provide a (cite/ref)

…hardware/arduino/avr/platform.txt:

Default “compiler.path” is correct, change only if you want to override the initial value

compiler.path={runtime.tools.avr-gcc.path}/bin/
compiler.c.cmd=avr-gcc
compiler.c.flags=-c -g -Os {compiler.warning_flags} -std=gnu11 -ffunction-sections -fdata-sections -MMD -flto -fno-fat-lto-objects
compiler.c.elf.flags={compiler.warning_flags} -Os -g -flto -fuse-linker-plugin -Wl,–gc-sections
compiler.c.elf.cmd=avr-gcc
compiler.S.flags=-c -g -x assembler-with-cpp -flto -MMD
compiler.cpp.cmd=avr-g++
compiler.cpp.flags=-c -g -Os {compiler.warning_flags} -std=gnu++11 -fpermissive -fno-exceptions -ffunction-sections -fdata-sections -fno-threadsafe-statics -MMD -flto

"man avr-gcc":

-Os Optimize for size. -Os enables all -O2 optimizations that do not
typically increase code size. It also performs further
optimizations designed to reduce code size.

For better or worse, gcc is not particularly GOOD at optimizing for size on the AVR; I think it makes a lot of its decision based on “intermediate code” or “bad settings” of just how its internal abstractions of the code translate into AVR code (like, I’m not sure it knows that adding two longs is 4 times bigger than adding two chars.)

westfw:
On an Uno, I get 1832 bytes free with all things present, and 1820 bytes free with only one structure. So: observation duplicated!

Woow what a great investigation! Thanks
That does explain the weird freeMemory() behaviour.

Now onto learning how to get struct and classes into PROGMEM …