SPlatten:
The evidence is in numerous books.
That's not "evidence", that's "theory."
My experience working with assembler dates back to the early 80's with 6502 and Z80.
Obsolete architectures with few registers, and "fast" memory.
when you run out of registers ... it simply switches to the stack.
This might be useful: Calling convention - Wikipedia
Yes, but that doesn't mean that a function call always, or even usually, pushes arguments on the stack.
Four of the Five CPU calling conventions at your reference are described as passing function arguments in registers. The fifth as "sometimes" doing so... You would be hard-pressed to find a function call that passes arguments on the stack in a typical Arduino sketch.
Now, if you have a non-leaf function which is using lots of registers, avr-gcc will still push them on the stack to preserve their values. But the paradigm has definitely shifted from "push arguments on the stack, save registers in the subroutine" to "push argument registers on stack if needed, put new arguments in the registers" in many architectures.
This does point out that this is something that should be investigated when performance is critical. For critical timing, you should ALWAYS be able to look at the assembly code produced, and understand it well enough to detect whether the compiler has done something stupid. Although the "don't try to outsmart the compiler" dictum still applies; it is generally safe to assume that the overhead of function calls is "somewhat minimized" compared to everything else. To be otherwise would be to discourage "good programming practices" ("subroutines are good.")
avr-gcc is in fact rather aggressive about inlining small functions, even with -Os. If you're writing something like a bootloader, you have to go and use special switches "never inline unless you've been told to", or it will swell your program by inlining calls to uart_getc() and similar:
BillW-MacOSX-2<10070> avr-gcc -g -Wall -Os -fno-inline-small-functions -fno-split-wide-types -mshort-calls -mmcu=atmega328p -DF_CPU=16000000L '-DLED_START_FLASHES=3' '-DBAUD_RATE=115200' -c -o optiboot.o optiboot.c
BillW-MacOSX-2<10071> avr-size optiboot.o
text data bss dec hex filename
506 0 0 506 1fa optiboot.o
BillW-MacOSX-2<10072> avr-gcc -g -Wall -Os -fno-split-wide-types -mshort-calls -mmcu=atmega328p -DF_CPU=16000000L '-DLED_START_FLASHES=3' '-DBAUD_RATE=115200' -c -o optiboot.o optiboot.c
BillW-MacOSX-2<10073> avr-size optiboot.o
text data bss dec hex filename
924 0 0 924 39c optiboot.o