roybrus:
I can almost guarantee you that you will see no appreciable performance increase when switching from returning a temporary value to using a global buffer. I don't know what evidence you have to suggest that this is more efficient, but if it is, it's a negligible amount of improvement. Using returned values is the standard convention.
roybrus,
This along with the examples provided are misleading.
The examples you show are all using pointers and not directly using globals.
Even the C++ call by reference is going to use pointers under the hood
since it can't be guaranteed that the arguments will only be particular globals.
Using globals can be significantly faster than using more standard approaches, when using the AVR.
The reason being that the AVR has pretty weak pointer capabilities.
As a result it has do jump through quite a few hoops to handle pointer references.
When using a global the final address of the variable is calculated at link time vs the pointers
address calculated at run time. Anytime you can shift something away from being done at
runtime you pick up performance.
As majenko said it is a matter of what is "better".
And as far as "better" goes, "it depends".
"better" can vary depending on what is preferred (performance, code size, maintainability) and
the specific situation.
To some "better" may mean easier to maintain and add new features,
but to others, balls to the wall speed is important.
When it comes to speed/performance
depending on the specific function, using a global can even be faster
than passing down a local/automatic and returning a value.
Normally this would be quite efficient on the AVR as the values would remain in registers;
however, if the function has many local variables,
it may have save & restore those locals before calling the subfunction and then the called
function may also have to save & restore registers as well. Whereas using a global may
potentially avoid that scenario.
Pointers and array indexing is can create significant overhead on the AVR
vs using a global particularly if there are multiple references to the elements.
foo->bar = 1;
foo->bar2 = 2;
foo->bar3 = 3;
will generate more code than using a global:
foo.bar = 1;
foo.bar2 = 2;
foo.bar3 = 3;
which will tend to generate more code than individual globals:
foobar = 1;
foobar2 = 2;
foobar3 = 3;
All that said, when it comes to performance tuning, profiling is the name of the game.
You must actually profile the code to see where the overheads really are.
Without doing this, it is simply guessing, and that can lead to wasting tons of time
optimizing the wrong areas.
Another thing to keep in mind is that when profiling you must take into
consideration the system as a whole. It does no good to profile individual components
until you know where the biggest overheads are.
i.e. in order to make a difference you must attack the largest overheads.
This may seem obvious, but it is the single biggest mistake I see people make
when trying to optimize code/performance.
There are many ways to profile a system depending on what is desired.
(code size vs speed).
When looking for improvements, the fist things that must
be understood is just where the overhead is.
For code size improvements, you will have to look at the actual assembly code
to see which routines are generating the large areas.
For speed improvements, you will need to profile the live system using
additional tools. A scope and logic analyzer are often quite useful when used
with specially inserted strobing code. This allows you to see the actual timing
of the running code.
Another thing to keep in mind is what I call "the algorithms".
In other words, the actual approach of how the code/system works.
In many cases simply optimizing the way to code works, can yield better
results than optimizing the underlying code itself.
My favorite saying that I have stressed on other developers over the decades
is "better code beats faster code every time".
i.e. if you can do things more efficiently, it often picks up more performance
than pulling out your hair speeding up a less efficient way of doing things.
Again this seems obvious, but it is another thing that is often overlooked.
Sometimes little things can make a big difference.
Like shifting overhead in time. By that I mean, there are often ways of shifting
certain overheads to different places in the code.
For example, sometimes you can pre-calculate things.
If those pre-calculations can be done during initialization, then
that overhead is removed from critical run-time path which
will speed up the normal run-time path of the code.
So my recommendations for optimization is:
- Understand where the overheads are
- See if there is a different approach that could be better
- attack the largest overheads first
These steps work for any type of optimization.
(size or speed)
My observation in this specific case would be that there may be some
unwarranted initial concern or at least concern for the wrong areas.
The system is using i2c. i2c at its default 100khz clock rate is pretty slow.
So my guess ( and I HATE guessing) would be that i2c and how the i2c bus
is handled will be the bottleneck vs using globals, vs locals, etc....
Sometimes, the way you use the i2c bus & wire lib can dramatically affect
performance. i.e. I was able to double the speed of a i2c LCD library
by slightly modifying the way the library talked to the PCF8574 chip.
But my first step would be going back the steps I showed above
and profiling the system to see where the actual overheads are and
then seeing if there is some other way to do things better, and then
finally optimizing code.
I think it was Knuth, that said:
"Make it work. Then make it fast".
--- bill