A simpler solution: have a free running timer, with an isr that increments a variable as a msb counter. So with a 16-bit timer + 16-bit msb counter, you have created a 32-bit timer.
Route the outside signals to trigger external interrupts. In the isr for the interrupts, you save the timer + msb values. This approach essentially creates a software-based capture function. The minimum resolution is the pulse train fed to the timer. So it can be 1/16us.
Depending on your application, this may have a few issues. For example, latency in the isr for the INTx can be 10 - 20 ticks. The good news is that such a latency is consistent from invocation to invocation.