You can do inline assembler stuff but the closest I have got to it is to use asm volatile("nop\n"::);
to give me a delay guaranteed to be greater than a few nanoseconds.
You may be focusing on the wrong thing to optimize though. If your "do blah" takes microseconds or even milliseconds to execute then agonizing over the few instructions required to shift and test a bit is a waste of time.
Pete