Stack size and sub-methods

Hi Folks,
I've been writing a fairly large sketch which has functions that call other functions several times deep. They're not recursive, but say about 7-subroutines deep. From what I understand about the stack, is that all the register values are pushed onto the stack each time a new method is called. Are all the local variables from the calling method also pushed onto the stack as well? They must-- I don't see how we'd keep track of local variables otherwise.

I understand that the stack starts at the highest SRAM address and grows downward, whereas the regular Data Memory starts just after all the External IO Registers and grows upward. Eventually, as SRAM usage grows, the heap space and the stack space collide.

The reason I ask is because once I reached a certain method-depth, the program ceases to function correctly... if at all. I assume that the heap data has over-written part of the stack, and thus a return value is no longer valid. From my previous programming experience, it was good practice to use local variables wherever possible... but with such limited stack space, and no warning that stack corruption is taking place, I'm unsure if there's a better way to handle this.

Is there a way to set a maximum stack size? Or to limit the area of SRAM which can be used for the heap? How do the advanced users out there deal with these issues?

1 Like

What are you using for a uC now? Have you considered one with more memory? If Flash size is not a problem, what about just putting some of the functions in the code, vs using them as functions?

BKnight760:
Hi Folks,

From what I understand about the stack, is that all the register values are pushed onto the stack each time a new method is called.

Nope. It's a question of usage. Calling l1 only uses 10 bytes of stack space (return addresses)...

void l5( void ) {}
void l4( void ) { l5(); }
void l3( void ) { l4(); }
void l2( void ) { l3(); }
void l1( void ) { l2(); }

Are all the local variables from the calling method also pushed onto the stack as well?

Still a question of usage. The compiler first uses registers for local variables. If a local register-based variable is not used after a call, the value is very likely discarded (meaning zero stack space used by it).

Is there a way to set a maximum stack size?

No.

Or to limit the area of SRAM which can be used for the heap?

Yes...
http://www.nongnu.org/avr-libc/user-manual/malloc.html

How do the advanced users out there deal with these issues?

Don't use the heap.

Currently I have two Arduinos. One is a Duemilenova DIP the other is an Uno SMD. Both are using the ATMega328p. 2kB SRAM, 32kB Program Mem. The Program Mem doesn't seem to be a problem, it's more the SRAM... It looks like the stack is getting pretty large.

End of SRAM = 0x08FF
End of Regs = 0x00FF
SRAM Size = 0x08FF-0x00FF = 0x0800 = 2048 Bytes (2kB)

Stack Pointer = 0x0606
Stack Size = 0x08FF- 0x0606 = 0x0259 = 761 Bytes
Heap Space = 0x0800-0x0259 = 0x05A7 = 1447 Bytes

Why isn't there some sort of protection to prevent the heap and stack from overlapping? Don't they protect against this on other processors? You'd get some sort of error like a stack-overflow or something on a PC. Is that too much work on a uC?

Is there a way I can program this to avoid having such a large stack?

Let's see what the crystal ball has to say... Tomorrow's weather will be slightly warmer but still nice. Huh. Nothing about your sketch. Maybe next time.

Or, as PaulS has been saying lately, move just a bit more to the right. I can't quite see your code.

There are really three types of SRAM memory:

  • Static variables - allocated at the low part of RAM
  • The stack - allocated at top of RAM and growing downwards
  • The heap - allocated after static variables and growing upwards

Are all the local variables from the calling method also pushed onto the stack as well? They must-- I don't see how we'd keep track of local variables otherwise.

Not exactly. Local variables are indeed on the stack, but the stack pointer is adjusted to allow for them (they are not literally pushed or popped).

Judging by a disassembly of some test code, entering a function with a substantial amount of local variables (ie. more than can fit into registers), the compiler generates code to subtract x from the stack pointer (where x is the number of bytes in local variables). At the end of the function the compiler adds x back to the stack pointer. Thus the stack area also stores local variables.

If you are not recursing, there wouldn't be a heap of difference between using static variables, and auto variables. The memory has to come from somewhere.

Without seeing your code it is hard to say more, but if your functions allocate a lot of memory for local variables, there will be trouble brewing. eg.

void foo ()
{
char bar [200];

...
}

Each time foo is called you will use up 202 bytes of stack (200 for "bar", plus the return address).

Actually judging by the disassembly, it will be more than 2 bytes for the function call. For example it seems to save registers 28 and 29, which is another 2 bytes.

  b6:	df 93       	push	r29
  b8:	cf 93       	push	r28
...

  e2:	cf 91       	pop	r28
  e4:	df 91       	pop	r29
  e6:	08 95       	ret

Is there a way I can program this to avoid having such a large stack?

Don't use auto variables? If you are only nested 7 deep or so, the function calls themselves aren't the problem.

Why isn't there some sort of protection to prevent the heap and stack from overlapping? Don't they protect against this on other processors?

I don't see how the processor can easily do this. When you malloc something it could conceivably make sure it doesn't go past the bottom of the stack, but when you do a function call, how will the processor know where the top of the heap is? The heap is purely a piece of code in a library. It isn't inherent in the processor architecture.

And even if the processor could somehow detect that calling a function would overwrite the heap, what would it do exactly? Stop? Well the code effectively does that anyway if the stack and heap collide.

Why isn't there some sort of protection to prevent the heap and stack from overlapping?

Because that would be difficult.

Don't they protect against this on other processors?

For most "large" processors, this is more a function of the operating system and memory management hardware than the processor itself. A stack would typically be allocated to some number of memory pages, with an unallocated (not readable or writable) page on either edge. If you overflow the stack, the operating system gets an illegal memory access fault, and either figures out that it is some sort of stack over/underflow, or handles it as a more generic memory access error ("segmentation fault.") Very few processors without memory management will have hardware support for detecting stack overflows. Some of them will implement checks in their operating system to see if the stack is filling up; for example by setting stack memory to some magic number (0xFFFF) and then checking at each context switch to see whether those are still there...

Keep in mind that the total RAM memory of the ATmega328 (2K) would be considered a "very small" stack on most CPUs. You simply cannot implement all the same sorts of algorithms on a microcontroller that you can on a microcomputer with lots of RAM.

Thanks Nick and westfw. I was happy to see some helpful conversation. That totally makes sense now that we're dealing with a pure software stack which would have to be managed by an OS for large processors. I guess I imagined some sort of hardware stack implemented in the chip architechture, or that they kept pointers to the high used heap address. Now I understand why it's not managed for embedded devices.

So, just for my education, how are variables allocated in the heap space? What I mean, is how is the memory managed? How do we keep track of which memory addresses are currently allocated to variables, and which addresses are free? Is that done by the compiler? If I were to attempt to write my own code to detect a heap / stack collision, is there a way to get the byte address of the highest used heap space memory?

Thanks.

I think there is a variable in the heap manager (memory allocation) which tells you the highest heap space, because there are some routines that report the amount of memory free.

However anything you do there doesn't stop the stack growing downwards and hitting the heap, it only stops the heap growing upwards and hitting the stack.

how are variables allocated in the heap space?

The heap is the area managed by malloc(), so it doesn't get "variables" per se.
RAM is divided into 4 regions:

  • Data: initialized data. The compiler/linker can keep track of exactly how much there is of this, based on just the source code.
  • BSS: uninitialized data. Ditto, except that it doesn't need a copy stored in flash to do the initialization. Data and BSS will normally be positioned starting at the very beginning of the actual RAM, at least on microcontrollers where the code is "elsewhere."
  • Stack: Usually starts at the top of memory and grows downward. The stack gets return addresses for subroutines, saved registers, and "local variables" (those defined inside of a function.)
  • Heap: memory that can be dynamically allocated, usually by functions like malloc() and free() (though I suppose "new" as well?) Usually the Heap will start immediately after the Data and BSS areas, and grows "upward" until it collides with the stack (hopefully that doesn't happen.) Details of how the heap is managed are up to the library functions. The compiler/linker will usually output enough information that C-level code will know "there is a region of memory starting at X and ending sometime before Y that is not in use by any of the variables defined in the program." Then it can carve that up using a bunch of different possible algorithms (with varying memory overhead, speed of malloc or free, faster access to commonly used bock sizes, coallescing of freed memory, reporting and debugging, and ... other features.) (It is my observation, in 20+y of experience with an in-house-written malloc(), that the memory allocator will gradually increase in complexity until the point where someone decides that there needs to be a "low overhead" version that is faster and cheaper. The applications that want the extra speed start using that, and adding features to IT until it reaches a complexity that ...) (and I am utterly SHOCKED at the lack of debugging fetaures in most allocator implementations.)

You'll notice that the above "theory" is relatively independent of the actual language in use. Pretty much all modern computer languages will have the same basic regions of memory, treated the same way. A big difference is that some languages will have "garbage collection" that will try to figure out when dynamically allocated memory is no longer being used, removing the need for the programmer to explicitly free memory. (which is nicer when you have a gig or two of free memory, so that having a bunch sitting around waiting to be GC'ed doesn't stop the rest of your program from working.)

Nick- where do you find the object code created for the Arduinos?
The IDE hides all of this from us.

Can we enable the generation of a listing or map file?
It may be of some help for diagnosing wierd problems.

cappy2112:
Nick- where do you find the object code created for the Arduinos?
The IDE hides all of this from us.

Depends on the version of your IDE.

Check Preferences to see if you an option for verbose output.

If not, press shift when you click verify or upload.

You'll see the temporary directory with all the intermediate files.

cappy2112:
Nick- where do you find the object code created for the Arduinos?

What James said.

cappy2112:
Can we enable the generation of a listing or map file?

I think it can be found out, I can't recall how.

Can we enable the generation of a listing or map file?

The compiler process generates a ".elf" file as well as the .hex file. .elf is a very information-rich format. Once you figure out where java decided to put the binaries (by using the verbose compile options), it is generally easier to post-process the .elf file to produce the listing and map files, than to try to get the compile process to generate them in the first place.

"avr-objdump -S foo.elf" generates a nice listing, and will do a bunch of other stuff as well.
"avr-nm foo.elf" is the basic symbol table dump (also with many options.)

These binaries are included in the IDE install, off in .../hardware/tools/avr/bin/
There is even a set of man pages...

westfw:

Can we enable the generation of a listing or map file?

The compiler process generates a ".elf" file as well as the .hex file. .elf is a very information-rich format. Once you figure out where java decided to put the binaries (by using the verbose compile options), it is generally easier to post-process the .elf file to produce the listing and map files, than to try to get the compile process to generate them in the first place.

"avr-objdump -S foo.elf" generates a nice listing, and will do a bunch of other stuff as well.
"avr-nm foo.elf" is the basic symbol table dump (also with many options.)

These binaries are included in the IDE install, off in .../hardware/tools/avr/bin/
There is even a set of man pages...

Yep- Found it. Been looking through those listings.
Too bad the IDE doesn't have hooks to display that stuff.

Thanks