Millis doesn't roll over cleanly so I am not sure what your test is doing, but the code I ran isn't being optimized out. Here is the actual code that the compiler produced for the test that I ran:
unsigned long now() // return the elapsed seconds since system start
1626: cf 93 push r28
1628: df 93 push r29
162a: c0 e0 ldi r28, 0x00 ; 0
162c: d0 e0 ldi r29, 0x00 ; 0
{
for(int i = 0; i < 3600; i++){
millis();
162e: 0e 94 e2 0c call 0x19c4 ; 0x19c4 <millis>
sysTime++;
1632: 20 91 17 01 lds r18, 0x0117
1636: 30 91 18 01 lds r19, 0x0118
163a: 40 91 19 01 lds r20, 0x0119
163e: 50 91 1a 01 lds r21, 0x011A
1642: 2f 5f subi r18, 0xFF ; 255
1644: 3f 4f sbci r19, 0xFF ; 255
1646: 4f 4f sbci r20, 0xFF ; 255
1648: 5f 4f sbci r21, 0xFF ; 255
164a: 20 93 17 01 sts 0x0117, r18
164e: 30 93 18 01 sts 0x0118, r19
1652: 40 93 19 01 sts 0x0119, r20
1656: 50 93 1a 01 sts 0x011A, r21
prevMillis += 1000;
165a: 80 91 1b 01 lds r24, 0x011B
165e: 90 91 1c 01 lds r25, 0x011C
1662: a0 91 1d 01 lds r26, 0x011D
1666: b0 91 1e 01 lds r27, 0x011E
166a: 88 51 subi r24, 0x18 ; 24
166c: 9c 4f sbci r25, 0xFC ; 252
166e: af 4f sbci r26, 0xFF ; 255
1670: bf 4f sbci r27, 0xFF ; 255
1672: 80 93 1b 01 sts 0x011B, r24
1676: 90 93 1c 01 sts 0x011C, r25
167a: a0 93 1d 01 sts 0x011D, r26
167e: b0 93 1e 01 sts 0x011E, r27
1682: 21 96 adiw r28, 0x01 ; 1
1684: 8e e0 ldi r24, 0x0E ; 14
1686: c0 31 cpi r28, 0x10 ; 16
1688: d8 07 cpc r29, r24
168a: 89 f6 brne .-94 ; 0x162e <_Z3nowv+0x8>
}
return sysTime;
There is a significant performance hit on making a function call so your call to Serial.println() probably takes much more time than all the other code in the function (my guess is 130ms of the 150ms you measured is spent doing the Serial.println call.
Anyway, although we are getting quite different measurements, I don't think we are really disagreeing on the principles being discussed here.
Have fun!