realtime clock, microseconds, etc.

Doesn't it also rely on F_CPU being divisible into 64 million? I bring this up only because there has been some discussion of supporting processors at 20MHz and it seems like this might cause a problem if F_CPU were 20000000.

Quite right. I overlooked that aspect of it. In my implementation of a higher precision timing function (see the link in my previous post), the function returns Timer0 ticks. This mitigates the 20MHz problem or, at least, defers it until conversion to microseconds is later done.

The Arduino-like device that I'm testing runs at 20MHz and my hpticks() function works correctly on it (the second attempt, at least). One issue to consider is that the suggested implementation of a microseconds function has a resolution of F_CPU / 64 since it is based on Timer0 ticks and Timer0 is clocked at 1/64th of the CPU frequency. Although a microseconds() function may be more aesthetically pleasing, the implementation essentially "wastes" a portion of the 32-bit value range. A function that returns Timer0 ticks will have the same resolution as a microseconds() function but will have a larger useful range. Moreover, the range will be constant irrespective of the CPU speed.

For measuring elapsed time, you can still think in terms of microseconds but convert the desired number of microseconds to Timer0 ticks (with either rounding or truncation as needed) before comparing it to the difference between two readings.