There appears to be substantial savings from cutting down on clock speed and supply voltage level
Because at each cycle, you will need to remove from / add to the millions of cmos gates a tiny charge.
The following activities are as intensive as I plan to get with my sketch (worst-case scenario):
You can estimate that.
20-30 LEDs (hooked up to an SPI-based shift register) blinked every 50 ms
100 cycles per 8 bit, 20x a second = 2k cycles.
Digital temperature sensor measured every 100 ms
Not sure which one but 200 cycles / each read x 10x a second = 2k cycles
SD card written to every 100 ms
Not a clue but let's say 10k cycles per read x 10x = 100k cycles
Xbee transmission every 100 ms
1k cycles per read x 10x = 10k cycles.
Pushbuttons connected to hardware interrupts
50 cycles each and let's you furiously push the buttons 20x a second = 1k cycles.
For a total of 120k cycles per second.
So at 1Mhz, 8:1 clock divider is probably OK but 1Mhz, 1:1 divider gives you far more room for error.
Not sure if you have considered this but with a pin or two, you could have obtained the ability to run a variable speed rc oscillator that allows you to change the clock speed on the fly.