It appears the GPS PPS output is really the way to go on this, as you suggested. I'd like to outline what I have in mind, and see if you think it would work.
So I have a DS3231 set to output a 1Hz square wave, which would interrupt the Arduino on D2. And I will have the DPS module PPS output interrupting the Arduino on D3. I don't want the millis() interrupt to complicate things, and besides, I need Timer0 elsewhere, so during the measurement process there wouldn't be any millis() activity.
I need to measure as accurately as possible how many Arduino clock cycles take place from the D2 interrupt to the D3 interrupt. So I would cascade Timer1 and Timer0 into a 24-bit timer by feeding the output of Timer1 into the external clock input of Timer0 (on D4).
When the D2 interrupt takes place, the ISR would simply clear both timers. Then the D3 ISR would read the count values of both timers. The measurement would be repeated every 15 minutes, and the Aging register adjusted until the reading is the same time after time. I'll have to come up with an adjustment formula that hopefully would get me to the final setting within a few hours.
Then after each reading I would temporarily set all the timers back to whatever is needed to do Serial and Wire activity, then go back into measurement mode. Note that it doesn't matter whether the RTC is set to the correct time. It only matters whether the RTC is getting faster or slower, and by how much, relative to the PPS tick. It also doesn't matter what the Arduino's clock speed is so long as it's consistent.
I've never tried cascading the two AVR timers, and am always a bit overwhelmed by the timers sections of the datasheet for the 328P, but it does seem like this should work. And 24 bits would essentially equal the resolution of a 16MHz crystal - although, if memory serves, Timer1 won't actually run at 16Mhz, but only at 8MHz. But that's still 8000 times better resolution than the nearest millis(). And depending on the jitter of the two interrupt sources, it may actually be more than is needed.
This should also work with the WWVB receiver, but it seems GPS is just a less variable source of comparison because of WWVB's propagation variability. But I hope to try both and see if there's any material difference.
And finally, does this GPS module look ok?
https://www.amazon.com/Microcontroller-Compatible-Sensitivity-Navigation-Positioning/dp/B07P8YMVNT