32 bit Microcontoler

Well, ZPUino Premium Core (which is published right now) uses mostly 2-clock cycles per instruction, and ZPUino Extreme will do mostly 1 cycle/instruction.

See this post ZPUino: ZPU core comparison for a core overview.

Also don't forget it is able to run at significant higher speeds even in low-end FPGA (right now 96MHz).

And yes, power consumption varies a lot, and you have to take special care about it. Gated clocks are a solution (ie, disable parts of the design when you don't use them). I'm not paying yet attention to that, but power optimizations are in my to-do list.

Also you have the power to put dedicated hardware inside the FPGA, thus saving on external components (and in some scenarios even power).

Álvaro