Why not do a well designed kernel instead of growing it by software sprawl. We have yield(), wait(), maybe Time-Triggered Cooperative (TTC) scheduling, a mutex, a way to wake a thread from an isr. Finally you will have the functionality of a true OS.
I started my career developing an OS for early multiprocessor supercomputers, CDC 6600 and others, in the late 1960s. I have watched this haphazard growth of the OS happen at each level of computer, mainframe, minicomputer, PC, single board computer, single chip processors, ....
Now Arduino is doing the same drill.
Why?