Looking for variant approach/sketch

Mmm, almost any algorithm may be optimizeable, however as you said is it worth the effort? If code works and is fast enough, why optimize?

I recall we had a lot of fun several years ago optimizing divmod10. A function to do an integer div 10 and a mod 10 in one go. The gain was huge (about 4x faster IIRC) and with assembly even more. The function is meant e.g. to display integers faster. Saving time on IO, gives extra cycles for other tasks.