The delayMicroseconds() implementation ends with
// busy wait
__asm__ __volatile__ (
"1: sbiw %0,1" "\n\t" // 2 cycles
"brne 1b" : "=w" (us) : "0" (us) // 2 cycles
which is a tight loop decrementing the var us. So I think the timing is quite precise for values > 2. This loop takes 4 cycles so it is executed 4 times per usec (16Mhz assumed).
The stepsize of 4 usec is the precision of the micros() function so if you want to measure time smaller than 4 usec you need a HW timer solution. Can't find the link now.