For now I've been using GuyA's code. I just stripped out the extra registers to see if it was an overhead problem... too much data having to be processed? It didn't seem to affect it.
I found another post of his elsewhere and realized he's using an ESU controller, where mine is a Marklin. I'm not sure if that really makes such a difference, or why it would.
I'm not sure what the speed is that it's all functioning at. It seems weird in the way that it malfunctions. If the Arduino couldn't communicate fast enough, I'd have expected the results to be chaotic? I don't know. I don't really have the tools to analyze what's going on behind the scenes. I thought about seeing if there's a sketch for Arduino to make it a logic analyzer, and maybe that'd help give me a little insight.
The code is simple enough, and being the noob that I am, that's good. I've got my head wrapped around most(part?) of what's happening. I've been thinking about reading up on how to directly manipulate the pins so I can pump the data out faster. I don't know if that'd solve the problem though. I've read there's some overhead with using digitalWrite as opposed to direct manipulation.
I don't know, maybe I'm in over my head here. Even with understanding parts of the code, there's other aspects I don't fully understand. Because of that I don't really know how to break the problem down.