No. Reading a 32bit microsecond count over a pin would typically be MUCH slower than the current micros() function...
Look, your test code demonstrates that micros() is much slower than millis(), but that's been explained: millis() is a very simple function that doesn't do much, so it is VERY fast. micros() does more work - it combines two separate counters (millisecond interrupts and individual timer ticks) with some moderate math. Easily several times more complicated, and millis() may get in-lined as well.
But that doesn't mean that micros() is actually SLOW! Here's the object code generated for micros() on an Uno:
unsigned long micros() {
unsigned long m;
uint8_t oldSREG = SREG, t;
3da: 3f b7 in r19, 0x3f ; 63
cli();
3dc: f8 94 cli
m = timer0_overflow_count;
3de: 80 91 ad 01 lds r24, 0x01AD
3e2: 90 91 ae 01 lds r25, 0x01AE
3e6: a0 91 af 01 lds r26, 0x01AF
3ea: b0 91 b0 01 lds r27, 0x01B0 ;timer0_overflow_count
t = TCNT0;
3ee: 26 b5 in r18, 0x26 ; 38
if ((TIFR0 & _BV(TOV0)) && (t < 255))
3f0: a8 9b sbis 0x15, 0 ; 21
3f2: 05 c0 rjmp .+10 ; 0x3fe <micros+0x24>
3f4: 2f 3f cpi r18, 0xFF ; 255
3f6: 19 f0 breq .+6 ; 0x3fe <micros+0x24>
m++;
3f8: 01 96 adiw r24, 0x01 ; 1
3fa: a1 1d adc r26, r1
3fc: b1 1d adc r27, r1
SREG = oldSREG;
3fe: 3f bf out 0x3f, r19 ; 63
return ((m << 8) + t) * (64 / clockCyclesPerMicrosecond());
400: ba 2f mov r27, r26
402: a9 2f mov r26, r25
404: 98 2f mov r25, r24
406: 88 27 eor r24, r24
408: bc 01 movw r22, r24
40a: cd 01 movw r24, r26
40c: 62 0f add r22, r18
40e: 71 1d adc r23, r1
410: 81 1d adc r24, r1
412: 91 1d adc r25, r1
414: 42 e0 ldi r20, 0x02 ; 2
416: 66 0f add r22, r22
418: 77 1f adc r23, r23
41a: 88 1f adc r24, r24
41c: 99 1f adc r25, r25
41e: 4a 95 dec r20
420: d1 f7 brne .-12
}
422: 08 95 ret
(and yes, if you're going to be this nit-picky about the speed of code, you DO need to learn how to interpret the object code!)
I count about 50 cycles for the whole thing; about 3 microseconds execution time. Reading 32bits via a pin would take at least 128 cycles (read a bit, merge into count, loop, times 32...), and that doesn't include clocking or waiting for a slower serial protocol (I2C, used by many clocks, runs at ~400kHz, so that would be 80 microseconds (1280 cycles) just waiting for the bits to arrive.)