Hmm. This was asked over on AVRFreaks, and it's FREQUENTLY a Frequently asked question about CPUs/etc, though I don't recall ever seeing it asked here. Since I actually did the experiment, I'll post the answer anyway!
while (1) {
digitalWrite(3, 1);
digitalWrite(3, 0);
}
produces a 106.8kHz square wave on digital pin 3 in Arduino 0010, 0011, and 0012. Though it would probably be foolish to count on exactly that speed; library functions are subject to change.
cli();
while (1) {
PORTD |= 0x8;
PORTD &= ~0x8;
}
on the same board runs at 2.667MHz. (This does produce the minimal sbi/cbi/rjmp loop that you'd expect, BTW.)
(so that's about a 20x penalty for the arduino library code; sounds about right: the overhead of abstracting IO to "pin number" is pretty substantial: a subroutine call, lookup table to get the port, another lookup table to get the bit, a third to check whether analogWrite is in use, and then less efficient instructions to access the port "indirectly")