I've made a few changes to Nick's files.
I've added osccal.c/h which are used by wiring.c. This is the OSCCAL_Calibrate() function I mentioned earlier. I've modified it by removing the CLKPR reset that it used.
In wiring.c I've added CLKPR_Calibrate() which checks F_CPU then sets CLKPR to the scaling that gets it the closest, based on the 8MHz RC oscillator.
In wiring.c Init() I call CLKPR_Calibrate() to set the clock prescaler, then OSCCAL_Calibrate() to try to use the 32kHz crystal to try to get the clock within a few percent of the target rate.
I tested on my Butterfly at 1,000,000, 4,000,000, 7,372,800, and 8,000,000. I tried at 500,000 too, but that didn't work. Everything else worked well enough that serial comm works. At 1mhz I was using 9600, and at 7.3mhz 115200 worked.
The delays that delayMicroseconds() produces are wrong when running with F_CPU < 8MHz, but delay() usually comes in pretty close. For very slow clocks I think the timer0 configuration for millis is going to have to be changed.
In wiring.c Init() I updated the setting of the A2D prescale in ADCSRA. I try to find the first prescale value under 200kHz and use that.
In wiring_serial.c I updated beginSerial() to set U2X when F_CPU is <= 1MHz, and to use the appropriate calculation for UBRR.
I think that's it. The result is that you can set your clock frequency in boards.txt (1,2,4 and 8Mhz should all work, as well as fairly wide deviations from those), and serial communication should behave like it is supposed to, and the calibration routine should get the clock within 2 or 3% of the requested frequency. The calibration routine does take just a little longer to start up, but it's under a second.
It could probably be better, but at least it seems to be working here.
You can download the files