A great deal of the arduino interface is to make it easy to write code:
x = digitalRead(pin); instead of low(er) level code.
People expect a certain behavior of these functions, and atetervak is expecting behavior of delayMicroseconds() which is in my opinion 100% reasonable. I also agree with his statement:
Nevertheless, it is natural to expect (from the website statement) that the actual delay would be below or somewhat close to delayMicroseconds(3), this would be fine.
On the other hand his example of the "calculated delay" is not strong, partly because it uses float math that is "slow" compared to micros (as AWOL states). But one could have a similar problem when using integer math, which is in the same order of speed.
But whether you call it a bug or not, it is unexpected behavior, that can easily be fixed by changing a few lines in the code of delayMicroseconds():
(original code)
if (--us == 0)
return;
==>
(proposed code)
if (us < 2)
return;
us--;
where the first would only capture us = { 1 } the second captures both us = { 0,1 }
Not measured as I don't have an arduino at hand, but I expect the accuracy and speed will be similar.
Footprint is increased with 2 bytes (for 328, I do have a compiler

.
Can't report it as a bug/fix at the moment as the site -
http://code.google.com/p/arduino/issues/list - is not available ...