Thank you Mike 
We're talking the same hardware. So all architecture and pipe-lining talk all doesn't matter given the compiler you use for RTOS and bare metal (but higher level) are as optimized.
Deous:
You haven't worked with other OSes but Arduino - haven't you? Lol
Yes, I've used some other stuff and yeah, the ease you can make it do stuff is great. But it comes at a price, overhead. And I'm still fond of tiny cheap and simple uC's so I like bare metal more. And with the Arduino Framework it's pretty quick as well. But yeah, even that includes some overhead.
Deous:
This is ridiculous what you are saying here - sounds like you are repeating something from Wiki and have no idea about the point of this conversation.
Do you understand the difference between Peripheral and CPU ?
Is it? Others don't seem to agree with you
And are you saying everything on Wiki is incorrect? As far as I read (and I'm still in the process of reading all technical articles on Wiki
) it does agree with my books and my education. Bit bluntly at times but correct.
And yeah, I know the difference between CPU and peripheral, but what does it have to do with this all? Your program runs on the CPU, nowhere else...
Deous:
If I do processing on the thread or interrupt - the speed always depends on the CPU hardware speed limit.
The clock speed does, the speed of the application also heavily depends on WHAT you do. The whole RTOS layer isn't free or in thin air. It needs computation time before it even starts doing what you programmed to do. Aka, overhead. OR where do you think it runs?
I think you misunderstand async. Yes, with a RTOS you can async run tasks on program level. But on hardware level your single core micro still has to do everything synchronous.
And yeah, youu can now get multi core uC's for cheap. But you can use that bare metal as well. And that still isn't a core per task aka, more tasks on a single core synchronous.