Function() vs Speed

Much as I hate to contradict people, the evidence doesn't support this claim.

This sketch:

void setup ()
  {
  }
  
void loop ()
  {
  delay (1000);
  }

Generates, for loop:

000000a8 <loop>:
  
void loop ()
  {
  delay (1000);
  a8:	68 ee       	ldi	r22, 0xE8	; 232
  aa:	73 e0       	ldi	r23, 0x03	; 3
  ac:	80 e0       	ldi	r24, 0x00	; 0
  ae:	90 e0       	ldi	r25, 0x00	; 0
  b0:	0e 94 a3 00 	call	0x146	; 0x146 <delay>
  }
  b4:	08 95       	ret

Nothing is being pushed onto the stack there. Certainly, the number 1000 (unsigned long) which is 0x000003e8 is set up into 4 registers. But nothing is pushed, and nothing is popped. The compiler is doing the minimal (and therefore fastest) it needs to do.

I really don't see how you can pass (unsigned long) 1000 any faster or more efficient way to a function.

Once again, don't try to outsmart the compiler by writing obscure code.