Go Down

Topic: Function() vs Speed (Read 3364 times) previous topic - next topic

SPlatten

#30
Mar 11, 2012, 09:45 am Last Edit: Mar 11, 2012, 09:48 am by SPlatten Reason: 1
Having an understanding of how things work at the assembly level is always useful.  Compilers can introduce problems, especially when optimizations are enabled.  

Also, when talking about the compiler, remember the job of the compiler is to produce native machine code that the processor understands.  The way a function is called and the way parameters are pushed and popped onto and off the stack are not compiler issues, it comes down to the way the processor works. 
Kind Regards,
Sy

westfw

Quote
as I said would imply the parmeter delay is pushed onto the stack along with the program counter.

But you're wrong, and Nick is right.  MANY "RISC" processors (which have lots of registers, and generally "slow" access to memory) have a calling convention that places the first several arguments in registers, rather than pushing them on the stack.  (If the function then calls other functions, or recurses, it will end up saving those on the stack, if necessary.)

Interestingly (?), his information is hard to find.  http://www.nongnu.org/avr-libc/user-manual/FAQ.html#faq_reg_usage mentions it, but a FAQ is hardly a specification!

This does mean that if the function you are calling is relatively simple, the overhead is pretty low.  Register allocation has gotten smart.  Usually there isn't even any overhead of moving intermediate results into the proper "argument" registers. (so for example "delay(1000);" does NOT result in (ldi32 tmp32,1000; mov32 args32, tmp32; call delay;)  Just (ldi32 args,1000; call delay)  (where xxx32 mean whatever is necesary for 32 bits.  usually 4 8bit moves into 4 registers.))

Quote
I puzzles me why -g is used together with -Os...

Why?  -g controls debugging info generated; it doesn't turn off optimization or add code.  Optimized code can sometimes get re-ordered, with local variables eliminated or reused, making debugging a bit more "exciting" than usual, but it's not awful.  I like the quote on the page you reference:
Quote
Nevertheless it proves possible to debug optimized output. This makes it reasonable to use the optimizer for programs that might have bugs.

SPlatten

Registers are only used when available which depends entirely on what else the CPU is doing at the time, at all other times the stack is used, so you should not rely on the use of registers, but you should factor the stack into your operation as this is the worst case senario.

Kind Regards,
Sy

tuxduino



(...snip...)

Quote
It puzzles me why -g is used together with -Os...

Why?  -g controls debugging info generated; it doesn't turn off optimization or add code.  Optimized code can sometimes get re-ordered, with local variables eliminated or reused, making debugging a bit more "exciting" than usual, but it's not awful.  I like the quote on the page you reference:
Quote
Nevertheless it proves possible to debug optimized output. This makes it reasonable to use the optimizer for programs that might have bugs.



My (I guess wrong) assumption was that debug information has to be stored into the final executable, thus making it bigger. But we are optimizing for size as we are on small devices... What am I missing here ?

Nick Gammon


Registers are only used when available which depends entirely on what else the CPU is doing at the time, at all other times the stack is used, so you should not rely on the use of registers, but you should factor the stack into your operation as this is the worst case senario.


Yes, but if you dump the .elf file and you find that registers are being used, then that won't change. Which, it appears, happens for simple functions. Which I believe the original question was about.


My (I guess wrong) assumption was that debug information has to be stored into the final executable, thus making it bigger. But we are optimizing for size as we are on small devices... What am I missing here ?


I'm not sure what debug information would be stuck in the executable. Some optimisation options just make the object easier to follow, that's all (by not moving instructions around, like out of loops).
Please post technical questions on the forum, not by personal message. Thanks!

More info:
http://www.gammon.com.au/electronics

westfw

Quote
assumption was that debug information has to be stored into the final executable, thus making it bigger.

The .elf files becomes significantly swollen with debugging information, but it is all is separate linker sections that are easily stripped out when making the .hex files that actually load on the Arduino HW.

Runaway Pancake


call --> 4 cycles
ret ---> 4 cycles
(4 + 4) * 5 / 16 = 2.5 microseconds.



[Copious data redacted.]
My suggestion still is: don't try to outsmart the compiler. Write simple, readable code.


There's some elucidation.
I was figuring to snap things up a bit, but it looks like the returns for the effort would be negligible.
Thanks.
"Don't Try to Outsmart The Compiler" - that's long for a bumper sticker, but it has potential.
"Hello, I must be going..."
"You gotta fight -- for your right -- to party!"
Don't react - Read.
"Who is like unto the beast? who is able to make war with him?"

JimEli


Registers are only used when available which depends entirely on what else the CPU is doing at the time, at all other times the stack is used, so you should not rely on the use of registers, but you should factor the stack into your operation as this is the worst case senario.


Quote from: avr-libc FAQ

Function call conventions:

Arguments - allocated left to right, r25 to r8. All arguments are aligned to start in even-numbered registers (odd-sized arguments, including char, have one free register above them). This allows making better use of the movw instruction on the enhanced core.

If too many, those that don't fit are passed on the stack.


My experience and the FAQ states the opposite of what you state. Can you provide evidence to support your statement?

SPlatten

#38
Mar 11, 2012, 06:44 pm Last Edit: Mar 11, 2012, 06:50 pm by SPlatten Reason: 1
The evidence is in numerous books.  My experience working with assembler dates back to the early 80's with 6502 and Z80.  

You only have a limited number of registers, when you run out of registers, this doesn't mean you can't call any functions, it simply switches to the stack.  Go read some books or search online.

Like-wise, if you nest to many function calls you will run out of stack space and encounter a stack overflow.

Quote

If too many, those that don't fit are passed on the stack.


This might be useful: http://en.wikipedia.org/wiki/Calling_convention
Kind Regards,
Sy

JimEli


The evidence is in numerous books.  My experience working with assembler dates back to the early 80's with 6502 and Z80.  

You only have a limited number of registers, when you run out of registers, this doesn't mean you can't call any functions, it simply switches to the stack.  Go read some books or search online.

Like-wise, if you nest to many function calls you will run out of stack space and encounter a stack overflow.

Quote

If too many, those that don't fit are passed on the stack.



As suspected, your statements are generalized comments about compiler functionality and not specific to gcc, avr and arduino. I would suggest using the disassembler in AVRStudio to evaluate the specifics of a particular function call.

My experience also dates back to the 1980's and the 6502. A copy of Principles of Compiler Design is in my library.

SPlatten

Kind Regards,
Sy

tuxduino


Quote
assumption was that debug information has to be stored into the final executable, thus making it bigger.

The .elf files becomes significantly swollen with debugging information, but it is all is separate linker sections that are easily stripped out when making the .hex files that actually load on the Arduino HW.



Thanks.

Then that debug information would be useful only for an arduino simulator that would use the elf instead of the hex. Am I right ? Because I still haven't read about gdb-ing arduino live code... (I hope someone can contradict me on this :-)

AWOL

Quote
A copy of Principles of Compiler Design is in my library.

Gries is The Word
"Pete, it's a fool looks for logic in the chambers of the human heart." Ulysses Everett McGill.
Do not send technical questions via personal messaging - they will be ignored.

westfw


The evidence is in numerous books.

That's not "evidence", that's "theory."

Quote
My experience working with assembler dates back to the early 80's with 6502 and Z80.

Obsolete architectures with few registers, and "fast" memory.

Quote
when you run out of registers ... it simply switches to the stack.
This might be useful: http://en.wikipedia.org/wiki/Calling_convention

Yes, but that doesn't mean that a function call always, or even usually, pushes arguments on the stack.
Four of the Five CPU calling conventions at your reference are described as passing function arguments in registers.  The fifth as "sometimes" doing so...  You would be hard-pressed to find a function call that passes arguments on the stack in a typical Arduino sketch.

Now, if you have a non-leaf function which is using lots of registers, avr-gcc will still push them on the stack to preserve their values.  But the paradigm has definitely shifted from "push arguments on the stack, save registers in the subroutine" to "push argument registers on stack if needed, put new arguments in the registers" in many architectures.

This does point out that this is something that should be investigated when performance is critical.  For critical timing, you should ALWAYS be able to look at the assembly code produced, and understand it well enough to detect whether the compiler has done something stupid.   Although the "don't try to outsmart the compiler" dictum still applies; it is generally safe to assume that the overhead of function calls is "somewhat minimized" compared to everything else.  To be otherwise would be to discourage "good programming practices" ("subroutines are good.")

avr-gcc is in fact rather aggressive about inlining small functions, even with -Os.  If you're writing something like a bootloader, you have to go and use special switches "never inline unless you've been told to", or it will swell your program by inlining calls to uart_getc() and similar:
Quote
BillW-MacOSX-2<10070> avr-gcc -g -Wall -Os -fno-inline-small-functions -fno-split-wide-types -mshort-calls -mmcu=atmega328p -DF_CPU=16000000L  '-DLED_START_FLASHES=3' '-DBAUD_RATE=115200'   -c -o optiboot.o optiboot.c
BillW-MacOSX-2<10071> avr-size optiboot.o
  text    data     bss     dec     hex filename
   506       0       0     506     1fa optiboot.o
BillW-MacOSX-2<10072> avr-gcc -g -Wall -Os -fno-split-wide-types -mshort-calls -mmcu=atmega328p -DF_CPU=16000000L  '-DLED_START_FLASHES=3' '-DBAUD_RATE=115200'   -c -o optiboot.o optiboot.c
BillW-MacOSX-2<10073> avr-size optiboot.o
  text    data     bss     dec     hex filename
   924       0       0     924     39c optiboot.o

westfw

Quote
debug information would be useful only for an arduino simulator that would use the elf instead of the hex. Am I right ?

Theoretically, a "live" debugger cam get both binary and debugging info from the .elf file, or assume that the .jex file and .elf file match.  I'm pretty sure that the Atmel debuggers actually do that...

Go Up