I was testing the speed of pin toggling and discovered that relying on void loop{} is actually slower than placing a 'goto back to start' to repeat the loop.
Here are the two versions:
In hardware->arduino->cores->arduino lies the 'main.cpp' file:
#define ARDUINO_MAIN
#include <Arduino.h>
int main(void)
{
init();
setup();
for (;;)
loop();
return 0;
}
So after the function 'void loop()' has finished, the control returns to the above file. The above file then calls void loop() again, and so on. This takes time.
suspect that there is internal "housekeeping" to handle things like updating timers, running counters, pulsing PWM outputs, etc. Arduino must have SOME of the "time slot" to handle those things.
There is no "time slot" - updating "micros" and "millis" is handled in interrupts or on-the-fly, and PWM outputs are handled by hardware.
To answer the original question, a simple "goto" will nearly always be faster than a "return from function" + "call to (same) function", but the latter won't get you despised by half the users who think that people who write "goto" in a C program should be drowned at birth.
@KE7GKP: Indeed there is a sensor to determine speed. I'm trying to get the maximum refresh rate as a math exercise which in turn would tell me how many LEDs at what speed can be considered. An eventual POV calculator if you will. In the end project there will probably be no need to speed up the void Loop {}. Perhaps only to get the data read from the EEPROM a tad faster ..
@AWOL: I was surprised to find goto in the reference actually. Is it a more recent addition or perhaps have I been in denial?
The goto statement has been part of C from the beginning, added to satisfy those people that barely got beyond BASIC programming, in my opinion. In 25 years of C/C++ coding, I've used goto exactly once, and that could have been avoided if I'd been thinking.
I see there is a performance hit for using functions though, them not being compiled inline and all that.. :~
I wonder if goto can give a speed advantage there?
I have had some interesting results posted on the POV math thread but there may have been some other factors involved. I'll quickly rewrite the code for both scenarios and see what the scope says ..
I was recently working on EtherCard::packetLoop, see tcpip.cpp, line 516. Note the vast number of return statements. Now I needed to add something in just before the method returned. Doh! So there are a few options at this point:
Put the code in before every return.
Refactor the whole thing to be a giant tangled mess of if/elses, even more than it already is.
Refactor even further to separate out into smaller methods.
Use a 'goto' in place of the returns, jumping to the new code just before the single return.
Generally, a goto is a good solution when there are a lot of error cases that need to halt further execution and you don't have exceptions available.
All of the typical infinite loop constructs ("while (1)", "for (;;)", goto, etc) end up producing a single branch instruction.
The delay "at the end of loop" in the original posting is the function return and call overhead.
Thanks westfw, I have bookmarked that thread looks like excellent reading!
I am however effectively looking for the fastest way to latch 595 registers, any ideas there? Current code implementation in code mentioned in thread above. What you are seeing on the scope pictures is 8 bytes being sent via SPI and the associated latchings.
I got to asking this question because these factors influence my readings and calculations.
westfw:
All of the typical infinite loop constructs ("while (1)", "for (;;)", goto, etc) end up producing a single branch instruction.
The delay "at the end of loop" in the original posting is the function return and call overhead.
To corroborate this, I did a simple test, print out main.cpp and blink before compile but after arduino process:
There's nothing at the end of the loop() or in main so must be overhead. I expect maybe several registers need to be changed (stack and instruction pointers etc.).
The LATCH_ON and LATCH_OFF all end up as single (2-cycle) instructions. The end/resumption of loop is three instructions (return, jmp, call) and both return and call take 4 cycles. So I'd expect the gap between the last bitset in the loop and the first one after the loop resumes to be about 5 times longer than the gap between consecutive bitsets inside the loop, which is just about what the scope trace shows.
I wouldn't call 10 cpu cycles a "delay"; when you optimize your code down to single instructions, you have to start being aware that EVERYTHING takes at least a little bit of time!
AWOL:
To answer the original question, a simple "goto" will nearly always be faster than a "return from function" + "call to (same) function", but the latter won't get you despised by half the users who think that people who write "goto" in a C program should be drowned at birth.
If you are trying to generate an exact square wave at an exact frequency, I suggest the 555 chip (or is it the 666 chip? I can never remember).
As for "despise", it's simply a case of using the right tool for the job. The goto statement has its uses, in possibly 0.01% of cases. In the example given:
... there is still going to be a slight discrepancy between the end of the first OFF and the start of the second ON, and the next one. The goto just makes it smaller (the extra instruction, whatever it does). The timer interrupts firing will also delay the code slightly. It will never be a perfect square wave.
Agreed. Goto is unduly demonized when in fact, its the author who should be in the receiving end of the ire. There is nothing wrong with goto in general. Having said that, its very frequently abused and misused. Its the classic, poor carpenter blaming his tools.
I completely agree with your comment. But, I do want to offer that the error can be further marginalized by unrolling the loop by hand. This is a little used optimization technique. With the above, the error is 1 out of every 2 pulses. Not so good.
So on and so on...unroll it until your error becomes acceptable - if possible. With the above, the error is now 1 out of 64 pulses. Still not great, but considerably better; being 32x more precise. Its the classic size vs speed trade off.