Does using micros() in a sketch cause the processor to work extra hard (slow down) trying to increment a counter so frequently, or is that happening anyway?
It's doing it anyway. Every 1.024 mS an interrupt occurs (when the timer overflows) and this is when the number used by millis() is calculated. Basically the interrupt adds one to the counter, and sometimes adds another one to allow for the fact that the timer interval is slightly longer than 1 mS.
To calculate micros, the micros() function interrogates the timer hardware to find exactly how far it is between the last millis "tick" and now. It then adds that to the
value used by millis() Timer 0 overflow count to get a higher-resolution result.
As michinyon said, because of the timer prescaler, this will always be a multiple of 4 (microseconds) because the timer has a prescaler of 64 (64 * 62.5 nS = 4 uS).
This calculation is reasonably quick (it has to get data from the timer register) however it would take longer than millis() because of the add, and there is a multiply in the code as well, which looks to me like it is being done by a couple of adds, so that is fairly fast.
It's interesting to note that, because of the fact that micros() uses the millis() figure as part of its calculation, that micros() must skip every now and then (every 41 mS or so) by a whole millisecond (1000 uS).(Edit) That last statement was wrong. micros() uses a different figure (simply the Timer 0 overflow count) to get the return value, and thus it is not affected by the "jumping" or inaccuracy that affects millis().