Ok I think I found what I was looking for.
Here is the data sheet for the 328 pro I am looking at which as the ASM and the timings:
If I use the following
ROR - Rotate right through Carry - 1 clock cycle
BRCS - Branch if Carry Set - 1/2 clock cycle
compared with if (data & (1<<i)) the compiler probably does a
LSR - logical shift right - 1 clock
AND - and with reg - 1 clock
BRSH - branch if same or equal - 1/2 clock
So basically its 2.5 clock cycles versus 1.5 clock cycles
40% better for the check alone...
Probably splitting hairs here but there is the answer in case someone else needs it critically...
Thanks for all the help!