#define versus const and other software performance issues

While we're on the subject of bit twiddling, I've seen the compiler turn this:

X = (A << 4) | B

into

swap A
X = A & 0xf0
X = A | B

The AVR documentation says that logical left shift takes one clock cycle so it would make more sense to me to have it LSL A, 4 rather than swap the nibbles and mask out the bottom nibble. Any idea why the compiler did it that way? Does it take one cycle per shift? The only other reason I could think of was to preserve the flags.