Thats interesting because sleep_enable() is defined as this:
#define _SLEEP_CONTROL_REG MCUCR
#define _SLEEP_ENABLE_MASK _BV(SE)
#define sleep_enable() \
do { \
_SLEEP_CONTROL_REG |= (uint8_t)_SLEEP_ENABLE_MASK; \
} while(0)
So that essentially optimises to:
MCUCR |= _BV(SE)
in other words:
MCUCR = MCUCR | (1<<SE);
EDIT:
I see what happened, you were not just overwriting the SE bit, but also the sleep mode select bits essentially putting it into idle mode sleep.
Just for pretty codes sake, your code can be written equally as:
set_sleep_mode(SLEEP_MODE_PWR_DOWN); // sleep mode is set here
sleep_enable();
MCUCR |= bit(BODSE) | bit(BODS); // timed sequence
MCUCR = (MCUCR & ~bit(BODSE)) | bit(BODS);
sleep_cpu();
But that is just personal preference as it better shows what is happening. Both are perfectly valid.
Out of curiosity, does this work:
set_sleep_mode(SLEEP_MODE_PWR_DOWN); // sleep mode is set here
sleep_enable();
MCUCR |= bit(BODSE) | bit(BODS); // timed sequence
MCUCR &= ~bit(BODSE); //BODS is already high.
sleep_cpu();
It may not do as the second modification of MCUCR will optimise to use the cbi() which means both are not being written at the same time. It would be interesting to know if it does work though.