I could not find a bootloader for operation at 8Mhz, hence using the original the device would run at half speed, with PWM and timer functions delayed by half.
The bootloader speed has nothing to do with the PWM and TIMER functions of the sketch. It ONLY affects the UART speed, and ONLY during bootloading.
Optiboot runs fine at 8MHz and 56kbps ("half speed.") Also, it uses the WDT for its timeout, which is a separate clock from the system clock.
The hardware uart needs at least 8 clocks per bit, and always samples near the "middle" of a bit. A software uart could be tuned to go a bit faster than that, and would have the freedom to pick the sampling times "more conveniently." For example, it was recently pointed out that you can't do 9600bps on a 1MHz AVR, because the needed divisor falls nearly exactly between two integers (5.5?), and picking either one gives you more than the maximum allowable error. I doubt that a SW uart implementation would have any problem with it...
This comes at a cost, of course. The HW uart gives you about 20 bit-times of time in between REQUIRED port reads where you can do other stuff (like compare the last character you read to your command characters.)
this is for a 64 byte version of optiboot
I'll believe THAT when I see it!
My minimal HW uart code (including the watchdog reset) is:
getchl: ldd ARG, Y+0 ;character ready?
sbrs ARG, 7
ldd ARG, Y+6
That's about half the size of your "small" SW uart, and I still couldn't get optiboot down even to 256 bytes. You can do a smaller bootloader, of course, but I don't believe that you can duplicate what optiboot does, enough to work with "-carduino"