Searching through some core code I came across the tone lib and especially the function:
void tone(uint8_t _pin, unsigned int frequency, unsigned long duration)
This function tries to find an optimal prescaler for the timer. To do that it calculates the OCR = F_CPU / frequency / 2 / SomePrescaler -1 nine times. Because division is expensive I took an extra variable uint32_t ocrRaw = F_CPU / frequency / 2; which is the repeating part of the formula and replaced that in the rest of the function.
I tested with ToneKeyboard.pde sample sketch to see the impact on the size. before patch => 3614 bytes after patch => 3526 bytes improvement == 88 bytes!
I did not test speed, I expect it to be slightly faster (no free Arduino nearby to test)