You can probably use the timer/counter to get a lot closer to the frequency you desire
Changing the crystal can help, at the possible expense of stuffing up other thing like serial and of course actual programming the chip

As Don said you are working with integer division and at the high freqs there aren't many options.
It may be better to go back to the software loop with NOP padding idea but I guess that was a problem because you want to run other code at the same time.
Maybe an external clock generator of some kind is needed.
______
Rob