Arduino Randomly Freezing During Long Jobs

Your persistence and tenacity are commendable.

Suppose a ram corruption issue (wild pointer or stack overflow) is smashing the 'OK' string in RAM. A well-placed zero written over the first character would make the string zero length, which would replicate the "OK stops coming" part of the bug.

You could test this by using the OK string as a canary for ram contamination:

char *canary = "OK";

…check in command loop…
if (canary[0] != 'O') the canary is dead

It should never change, but it would be interesting if it did, right? It would mean the last command did something to corrupt ram.

If you can pin it down to a single/certain command, that would be progress...

-br