Why is micros() so much slower than millis()?

The difference is about 36 instruction cycles. I think most of that will be the micros() function but a little will be:
if(timer >= 1000){
vs.
if(timer >= 1000000){

In the millis() case you compare a 32-bit integer to a 16-bit constant and with the micros() case you compare a 32-bit integer to a 32-bit constant. I suspect the full 32-bit comparison takes more cycles.