Much as I hate to contradict people, the evidence doesn't support this claim.
This sketch:
void setup ()
{
}
void loop ()
{
delay (1000);
}
Generates, for loop:
000000a8 <loop>:
void loop ()
{
delay (1000);
a8: 68 ee ldi r22, 0xE8 ; 232
aa: 73 e0 ldi r23, 0x03 ; 3
ac: 80 e0 ldi r24, 0x00 ; 0
ae: 90 e0 ldi r25, 0x00 ; 0
b0: 0e 94 a3 00 call 0x146 ; 0x146 <delay>
}
b4: 08 95 ret
Nothing is being pushed onto the stack there. Certainly, the number 1000 (unsigned long) which is 0x000003e8 is set up into 4 registers. But nothing is pushed, and nothing is popped. The compiler is doing the minimal (and therefore fastest) it needs to do.
I really don't see how you can pass (unsigned long) 1000 any faster or more efficient way to a function.
Once again, don't try to outsmart the compiler by writing obscure code.