VM/Interpreter - stack or register based?

Well, I now managed to get it no more than 24x slower than native code. No idea how to get it faster :confused:

My opcodes are the same as the AVR assembly, all vars are local and bytes, class structure is gone, the switch-tree to the bytecodes is replaced by gotos. Also it is register-based (faster than stack based). In the attachement the result (not yet all bytecodes are implemented). GCC-optimization with -O3 did not give any benefits.

Ideas are more than welcome!

Interpreter_RegBased.ino (13.1 KB)