Pages: 1 2 [3] 4   Go Down
Author Topic: Function() vs Speed  (Read 2107 times)
0 Members and 1 Guest are viewing this topic.
United kingdom
Offline Offline
Full Member
***
Karma: 0
Posts: 108
just think how much free time you would have if everything worked first time!!
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Having an understanding of how things work at the assembly level is always useful.  Compilers can introduce problems, especially when optimizations are enabled.  

Also, when talking about the compiler, remember the job of the compiler is to produce native machine code that the processor understands.  The way a function is called and the way parameters are pushed and popped onto and off the stack are not compiler issues, it comes down to the way the processor works. 
« Last Edit: March 11, 2012, 03:48:08 am by SPlatten » Logged

Kind Regards,
Sy

SF Bay Area (USA)
Offline Offline
Tesla Member
***
Karma: 106
Posts: 6378
Strongly opinionated, but not official!
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
as I said would imply the parmeter delay is pushed onto the stack along with the program counter.
But you're wrong, and Nick is right.  MANY "RISC" processors (which have lots of registers, and generally "slow" access to memory) have a calling convention that places the first several arguments in registers, rather than pushing them on the stack.  (If the function then calls other functions, or recurses, it will end up saving those on the stack, if necessary.)

Interestingly (?), his information is hard to find.  http://www.nongnu.org/avr-libc/user-manual/FAQ.html#faq_reg_usage mentions it, but a FAQ is hardly a specification!

This does mean that if the function you are calling is relatively simple, the overhead is pretty low.  Register allocation has gotten smart.  Usually there isn't even any overhead of moving intermediate results into the proper "argument" registers. (so for example "delay(1000);" does NOT result in (ldi32 tmp32,1000; mov32 args32, tmp32; call delay;)  Just (ldi32 args,1000; call delay)  (where xxx32 mean whatever is necesary for 32 bits.  usually 4 8bit moves into 4 registers.))

Quote
I puzzles me why -g is used together with -Os...
Why?  -g controls debugging info generated; it doesn't turn off optimization or add code.  Optimized code can sometimes get re-ordered, with local variables eliminated or reused, making debugging a bit more "exciting" than usual, but it's not awful.  I like the quote on the page you reference:
Quote
Nevertheless it proves possible to debug optimized output. This makes it reasonable to use the optimizer for programs that might have bugs.
Logged

United kingdom
Offline Offline
Full Member
***
Karma: 0
Posts: 108
just think how much free time you would have if everything worked first time!!
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Registers are only used when available which depends entirely on what else the CPU is doing at the time, at all other times the stack is used, so you should not rely on the use of registers, but you should factor the stack into your operation as this is the worst case senario.

Logged

Kind Regards,
Sy

Offline Offline
Edison Member
*
Karma: 26
Posts: 1339
You do some programming to solve a problem, and some to solve it in a particular language. (CC2)
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset


(...snip...)

Quote
It puzzles me why -g is used together with -Os...
Why?  -g controls debugging info generated; it doesn't turn off optimization or add code.  Optimized code can sometimes get re-ordered, with local variables eliminated or reused, making debugging a bit more "exciting" than usual, but it's not awful.  I like the quote on the page you reference:
Quote
Nevertheless it proves possible to debug optimized output. This makes it reasonable to use the optimizer for programs that might have bugs.

My (I guess wrong) assumption was that debug information has to be stored into the final executable, thus making it bigger. But we are optimizing for size as we are on small devices... What am I missing here ?
Logged

Global Moderator
Offline Offline
Brattain Member
*****
Karma: 452
Posts: 18694
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Registers are only used when available which depends entirely on what else the CPU is doing at the time, at all other times the stack is used, so you should not rely on the use of registers, but you should factor the stack into your operation as this is the worst case senario.

Yes, but if you dump the .elf file and you find that registers are being used, then that won't change. Which, it appears, happens for simple functions. Which I believe the original question was about.

My (I guess wrong) assumption was that debug information has to be stored into the final executable, thus making it bigger. But we are optimizing for size as we are on small devices... What am I missing here ?

I'm not sure what debug information would be stuck in the executable. Some optimisation options just make the object easier to follow, that's all (by not moving instructions around, like out of loops).
Logged

SF Bay Area (USA)
Offline Offline
Tesla Member
***
Karma: 106
Posts: 6378
Strongly opinionated, but not official!
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
assumption was that debug information has to be stored into the final executable, thus making it bigger.
The .elf files becomes significantly swollen with debugging information, but it is all is separate linker sections that are easily stripped out when making the .hex files that actually load on the Arduino HW.
Logged

Offline Offline
Edison Member
*
Karma: 57
Posts: 2193
Now, More Than Ever
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

call --> 4 cycles
ret ---> 4 cycles
(4 + 4) * 5 / 16 = 2.5 microseconds.

[Copious data redacted.]
My suggestion still is: don't try to outsmart the compiler. Write simple, readable code.

There's some elucidation.
I was figuring to snap things up a bit, but it looks like the returns for the effort would be negligible.
Thanks.
"Don't Try to Outsmart The Compiler" - that's long for a bumper sticker, but it has potential.
Logged

"Hello, I must be going..."
"You gotta fight -- for your right -- to party!"
Don't react - Read.
"Who is like unto the beast? who is able to make war with him?"

USA
Offline Offline
Jr. Member
**
Karma: 2
Posts: 86
If you can't fix it with a hammer, it must be an electrical problem.
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Registers are only used when available which depends entirely on what else the CPU is doing at the time, at all other times the stack is used, so you should not rely on the use of registers, but you should factor the stack into your operation as this is the worst case senario.

Quote from: avr-libc FAQ
Function call conventions:

Arguments - allocated left to right, r25 to r8. All arguments are aligned to start in even-numbered registers (odd-sized arguments, including char, have one free register above them). This allows making better use of the movw instruction on the enhanced core.

If too many, those that don't fit are passed on the stack.

My experience and the FAQ states the opposite of what you state. Can you provide evidence to support your statement?
Logged


United kingdom
Offline Offline
Full Member
***
Karma: 0
Posts: 108
just think how much free time you would have if everything worked first time!!
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

The evidence is in numerous books.  My experience working with assembler dates back to the early 80's with 6502 and Z80.  

You only have a limited number of registers, when you run out of registers, this doesn't mean you can't call any functions, it simply switches to the stack.  Go read some books or search online.

Like-wise, if you nest to many function calls you will run out of stack space and encounter a stack overflow.

Quote
If too many, those that don't fit are passed on the stack.

This might be useful: http://en.wikipedia.org/wiki/Calling_convention
« Last Edit: March 11, 2012, 12:50:23 pm by SPlatten » Logged

Kind Regards,
Sy

USA
Offline Offline
Jr. Member
**
Karma: 2
Posts: 86
If you can't fix it with a hammer, it must be an electrical problem.
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

The evidence is in numerous books.  My experience working with assembler dates back to the early 80's with 6502 and Z80.  

You only have a limited number of registers, when you run out of registers, this doesn't mean you can't call any functions, it simply switches to the stack.  Go read some books or search online.

Like-wise, if you nest to many function calls you will run out of stack space and encounter a stack overflow.

Quote
If too many, those that don't fit are passed on the stack.

As suspected, your statements are generalized comments about compiler functionality and not specific to gcc, avr and arduino. I would suggest using the disassembler in AVRStudio to evaluate the specifics of a particular function call.

My experience also dates back to the 1980's and the 6502. A copy of Principles of Compiler Design is in my library.
Logged


United kingdom
Offline Offline
Full Member
***
Karma: 0
Posts: 108
just think how much free time you would have if everything worked first time!!
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

I give up....
Logged

Kind Regards,
Sy

Offline Offline
Edison Member
*
Karma: 26
Posts: 1339
You do some programming to solve a problem, and some to solve it in a particular language. (CC2)
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
assumption was that debug information has to be stored into the final executable, thus making it bigger.
The .elf files becomes significantly swollen with debugging information, but it is all is separate linker sections that are easily stripped out when making the .hex files that actually load on the Arduino HW.


Thanks.

Then that debug information would be useful only for an arduino simulator that would use the elf instead of the hex. Am I right ? Because I still haven't read about gdb-ing arduino live code... (I hope someone can contradict me on this :-)
Logged

Global Moderator
UK
Offline Offline
Brattain Member
*****
Karma: 240
Posts: 24449
I don't think you connected the grounds, Dave.
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
A copy of Principles of Compiler Design is in my library.
Gries is The Word
Logged

"Pete, it's a fool looks for logic in the chambers of the human heart." Ulysses Everett McGill.
Do not send technical questions via personal messaging - they will be ignored.

SF Bay Area (USA)
Offline Offline
Tesla Member
***
Karma: 106
Posts: 6378
Strongly opinionated, but not official!
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

The evidence is in numerous books.
That's not "evidence", that's "theory."

Quote
My experience working with assembler dates back to the early 80's with 6502 and Z80.
Obsolete architectures with few registers, and "fast" memory.

Quote
when you run out of registers ... it simply switches to the stack.
This might be useful: http://en.wikipedia.org/wiki/Calling_convention
Yes, but that doesn't mean that a function call always, or even usually, pushes arguments on the stack.
Four of the Five CPU calling conventions at your reference are described as passing function arguments in registers.  The fifth as "sometimes" doing so...  You would be hard-pressed to find a function call that passes arguments on the stack in a typical Arduino sketch.

Now, if you have a non-leaf function which is using lots of registers, avr-gcc will still push them on the stack to preserve their values.  But the paradigm has definitely shifted from "push arguments on the stack, save registers in the subroutine" to "push argument registers on stack if needed, put new arguments in the registers" in many architectures.

This does point out that this is something that should be investigated when performance is critical.  For critical timing, you should ALWAYS be able to look at the assembly code produced, and understand it well enough to detect whether the compiler has done something stupid.   Although the "don't try to outsmart the compiler" dictum still applies; it is generally safe to assume that the overhead of function calls is "somewhat minimized" compared to everything else.  To be otherwise would be to discourage "good programming practices" ("subroutines are good.")

avr-gcc is in fact rather aggressive about inlining small functions, even with -Os.  If you're writing something like a bootloader, you have to go and use special switches "never inline unless you've been told to", or it will swell your program by inlining calls to uart_getc() and similar:
Quote
BillW-MacOSX-2<10070> avr-gcc -g -Wall -Os -fno-inline-small-functions -fno-split-wide-types -mshort-calls -mmcu=atmega328p -DF_CPU=16000000L  '-DLED_START_FLASHES=3' '-DBAUD_RATE=115200'   -c -o optiboot.o optiboot.c
BillW-MacOSX-2<10071> avr-size optiboot.o
   text    data     bss     dec     hex filename
    506       0       0     506     1fa optiboot.o
BillW-MacOSX-2<10072> avr-gcc -g -Wall -Os -fno-split-wide-types -mshort-calls -mmcu=atmega328p -DF_CPU=16000000L  '-DLED_START_FLASHES=3' '-DBAUD_RATE=115200'   -c -o optiboot.o optiboot.c
BillW-MacOSX-2<10073> avr-size optiboot.o
   text    data     bss     dec     hex filename
    924       0       0     924     39c optiboot.o
Logged

SF Bay Area (USA)
Offline Offline
Tesla Member
***
Karma: 106
Posts: 6378
Strongly opinionated, but not official!
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
debug information would be useful only for an arduino simulator that would use the elf instead of the hex. Am I right ?
Theoretically, a "live" debugger cam get both binary and debugging info from the .elf file, or assume that the .jex file and .elf file match.  I'm pretty sure that the Atmel debuggers actually do that...
Logged

Pages: 1 2 [3] 4   Go Up
Jump to: