__inline__ setup and loop?

Inlining them by default would save a couple of bytes and probably a decent speed increase (depending on application).

Either that or the ability to manually make your own main() would kick ass.:slight_smile:

hi cheater,
could you explain what you mean? this sounds like something i want to understand but i don't. what do you mean by inline and how would that make things faster?

ok, excuse me for asking stupid questions :-)... i had a short reading here: Inline (Using the GNU Compiler Collection (GCC))

[edit]
(i'm still reading but found this in the meantime)

Note that certain usages in a function definition can make it unsuitable for inline substitution. 
Among these usages are: use of varargs, use of alloca, use of variable sized data types (see Variable Length), use of computed goto (see Labels as Values), use of nonlocal goto, and nested functions (see Nested Functions).

[/edit]

Well with functions which are only called in one place, such as setup() and loop(), it removes the function call and basically copy pastes the code from the function in to where it was called.

Straight away it reduces the program by a few bytes but its real impact comes from loop().

Function calls cost a few cycles. loop() is called in a infinite loop so especially with smaller programs, it can give a little speed boost.

Its basically just a more efficient way of doing things.

We have a rule at work about "never make functions inline() unless you can demonstrate a real and useful boost in performance." Yeah, making loop() and setup() be inline will save you a whopping 8 bytes or so of code and the several cycles of overhead of a call instruction. Do you have an application where that difference is "real and useful" ? I mean, you can save 2KB by omitting the bootloader and programming your sketch directly into the AVR, and I don't even see yelling and screaming to reduce that 2K to 1K by streamlining the bootloader.

My assembly-language programmer's soul frequently cries at some of the inefficiencies in the Arduino libraries, but I keep reminding myself that the Arduino Environment is not about making the most efficient possible code (in either space or time domain.) It's a LOT more about not confusing learning programmers with complexities like "inline" or "inline" when their use is really questionable anyway.

If you have a program or application where the 8 bytes and 4 cycles "lost" by not inlining these functions matters, then you probably should not be using the arduino environment or hardware. Program in assembly, bump your clock up to 20MHz, and/or switch to an ARM with 64k memory and 60MHz...

Its a annoying habit of mine. :slight_smile:

I like things to be done the best way possible.
Consequently, few people can understand my 'cooler' code. It always works brilliantly though.

and I don't even see yelling and screaming to reduce that 2K to 1K by streamlining the bootloader.

Well now that you mention it.... ;D

Ah, youthful optimism. Your code is NOT "done the best way possible" if "few people can understand it." Period.

My pet peeve is millis(), which ends up importing many bytes of 32bit divide function so that delay() will be closer to actual milliseconds than the 2.5% off you'd get using the native clock tick of 1.024ms. Now, 2.5% off is HUGE if you're doing a clock, but the average user of delay(1000) (ie LEDBLINK) would have much more efficient and compact code just sleeping for 1.024 seconds.
(and of course, the thing I have to remind myself is that delay(1000) is something that needs to be efficient the way a fish needs a bicycle.)

Ah, youthful optimism. Your code is NOT "done the best way possible" if "few people can understand it." Period.

Ok a example:

  tmp1 = 0xFF - (1 << led);
  tmp2 = 0xFF - (1 << (led + 3 > 8 ? led - 6 : led + 3));
  tmp3 = 0xFF - (1 << (led + 6 > 8 ? led - 3 : led + 6));
  writeoutput(d);
  led++;
  if (led == 9)
  {
    led = 0;
  }

writeoutput dumps tmp1, tmp2 and tmp3 out in order to three 595's. Ignore the d parameter.

Guess what it does. :slight_smile:
Most people wouldnt have a clue yet it works and I'm pretty sure its the fastest way to do it.

I havent checked the assembly created by that yet so I havent optimized it fully.
I think its pretty good though.

My pet peeve is millis(), which ends up importing many bytes of 32bit divide function so that delay() will be closer to actual milliseconds than the 2.5% off you'd get using the native clock tick of 1.024ms.

Imho for things like that there should be two functions: one for imprecise measurements and one for precise measurements. Imprecise being the default.
That way the excess code is excluded unless specifically required.

I give up. What's it do? I don't recognize the pattern:

101111111111011111111110
011111111110111111111101
111111111101111111111011
111111101011111111110111
111111010111111111101111
111110111111111111011111
111101111111111010111111
111011111111110101111111
110111111111101111111111

Its a RGB LED chaser. Each 595 handles one colour.
A red, green and blue LED chasing each other.

Not entirely sure why turning on a LED requires a 0 but meh.

Hmm. The pattern doesn't look right to me, but perhaps one of us copied it wrong.
How about this (you'll need to change the output function to output the low 24 bits of "leds" to the three 595s):

unsigned long leds =  ~(0x040201);  /* One red, one green, and one blue LED on. */

void loop() {
  /* Rotate 24 bits */
  leds += leds;  /* Left shift one bit */
  if (leds & 0x1000000) {
      /* rotate 24th bit back into bit 0 */
      leds |= 1;
  }
}

The compiler IS smart enough to optimized the single bit tests of the long to tests within a single byte (yeah!), so the code isn't too bad. it's not quite as smart as it could be about moving the long too and from memory (it does much better if leds is a local variable (is there a way in gcc to get global variables to be stored in registers?)):

> arduino-0008/tools/avr/bin/avr-objdump -S RGBfollow.elf 
     :
void  loop ()
{
  /* Rotate 24 bits */
  leds += leds;  /* Left shift one bit */
  a4:   80 91 00 01     lds     r24, 0x0100
  a8:   90 91 01 01     lds     r25, 0x0101
  ac:   a0 91 02 01     lds     r26, 0x0102
  b0:   b0 91 03 01     lds     r27, 0x0103
  b4:   88 0f           add     r24, r24
  b6:   99 1f           adc     r25, r25
  b8:   aa 1f           adc     r26, r26
  ba:   bb 1f           adc     r27, r27
  bc:   80 93 00 01     sts     0x0100, r24
  c0:   90 93 01 01     sts     0x0101, r25
  c4:   a0 93 02 01     sts     0x0102, r26
  c8:   b0 93 03 01     sts     0x0103, r27
  if (leds & 0x1000000) {
  cc:   b0 ff           sbrs    r27, 0
  ce:   09 c0           rjmp    .+18            ; 0xe2 <loop+0x3e>
    /* rotate 24th bit back into bit 0 */
    leds |= 1;
  d0:   81 60           ori     r24, 0x01       ; 1
  d2:   80 93 00 01     sts     0x0100, r24
  d6:   90 93 01 01     sts     0x0101, r25
  da:   a0 93 02 01     sts     0x0102, r26
  de:   b0 93 03 01     sts     0x0103, r27
  e2:   08 95           ret

Hmm. The pattern doesn't look right to me, but perhaps one of us copied it wrong.
How about this (you'll need to change the output function to output the low 24 bits of "leds" to the three 595s):

One of each LED is lit so there is a blue LED following a green LED following a red LED.
Hence the 3 temp variables.

The 'mismatch' of the pattern is because there is a hidden 9th 'led' so it wraps properly.

(it does much better if leds is a local variable (is there a way in gcc to get global variables to be stored in registers?)):

Hmm I dont think its possible to make it local.
I've got a nice system where the animation changes every 5 seconds and it wouldnt work without code in loop() calling the (inlined) animation functions.
Plus each animation function reuses the variables.

Actually I wonder if inlining would basically do the same thing anyway.