As an example there is a project where some clever individuals have implemented a software USB emulator on 8-bit AVR's. This was made possible based on calculating exact execution time for individual mcu assembly instructions and squeezing that into a critical timing loop. A few kHz off and it would not have been possible.
Another project use a similar approach to generate NTSC video timing signals.
The millis timer in Arduino can be maintained using a simple shift and increment. This help towards minimizing execution overhead and reduce code size. Other frequencies may allow the same, but not just any frequency and so requirements may be in conflict.
The delayMicrosecond function is based on a delay busy loop executing exactly 16 mcu instructions (16 times 62.5nS @ 16MHz). Since yoiu can not delay a fractional instruction, other frequencies may not allow for this.
For some projects, clock requirements may be absolute, but others again may reflect your subjective preference (e.g. faster/simpler code). Like I said, if your priorities are for an exact usart baudrate, that may govern your choice more so than other issues. If you google for special puprose devices, I think you will see the fuill range of crystal frequencies in use. For a general purpose device such as Arduino however - 16 MHz is a pretty good choice.