asm(" ldi r16,8 \n\t out 0x0a,r16 \n\t .rept 10000 \n\t out 0x09,r16 \n\t .endr \n\t");
Should be 8MHz minus a bit for interrupts and loop overhead. Unfortunately I don't have a scope or frequency meter to test it.
I'm quite impressed by the 2.63MHz just from plain C code.