How many clock cycles does digitalRead/Write take?

However, in practice it is easier to be satisfied with a rough estimate of the time the function will take

And therein lies the problem. Do to the crazy optimizations that the compiler and linker can make, the time for calls like digitalRead()/digitalWrite() can now vary by nearly 100%.

I completely agree with this.
Unfortunately, these days getting that worst case number isn't as easy as it used to be.

I've pulled my hair out when doing USBasp f/w updates since the compiler/linker would make certain choices that caused code to blow up by 100s of bytes which can not only affect timing but can make the code no longer fit into certain parts.
While USBasp code does not use any Arduino code, the issue is the same.
When the compiler & linker decides to do these aggressive optimizations is very unpredictable. I've seen it make drastic changes when simply moving or tweaking some small fragments of unrelated code that might even be in a different/separate compilation unit.

With the new link optimizations, the final code generated can become nearly unrecognisable as functions are often removed and lots of code from external functions can be inlined in some cases and left as function calls in others so the the timing of calls to the same function within the same sketch can very dramatically.

Another problem I've run into is that a small piece of sketch code that might be used for timing arduino functions like digitalWrite()/digitalRead() is not representative of the larger code used in a typical project so the code you get and hence the timings from it are not representative of what will be seen in a real arduino project.
The timing of the functions in larger code could be longer or shorter depending on which optimizations were done.

I've seen the low level timings for my hd44780 library updating the LCD display vary depending on what is in the main line sketch code.
Think of that for a moment, library code is altered and optimized differently which changes timing depending on main line sketch code.

In 35 years of C programming I've never experienced the type of somewhat unpredictable optimization "issues" I've seen in the past couple of years do to the new types of gcc tool optimizations being done.
It used to be you could recognize your code even when source level debugging optimized gcc code.
That is no longer always the case.

--- bill