Low Level programing using c++

then if that doesn't give u the power to do what you want then it's assembler (thumb2).

Thumb isn't about speed/power efficiency, it's about space efficiency, and I can't imagine why anyone would want to write any explicitly.

I work on systems running to many MLOCs of C and C++, and about the only time we ever see ARM assembler is for a very few critical memory fill or copy operations.

Disclaimer: I can (just about) read ARM assembler (very, very occasionally, I have to follow code down to debug it), but I've certainly never written any.