Well, ZPUino Premium Core (which is published right now) uses mostly 2-clock cycles per instruction, and ZPUino Extreme will do mostly 1 cycle/instruction.
See this post ZPUino: ZPU core comparison for a core overview.
Also don't forget it is able to run at significant higher speeds even in low-end FPGA (right now 96MHz).
And yes, power consumption varies a lot, and you have to take special care about it. Gated clocks are a solution (ie, disable parts of the design when you don't use them). I'm not paying yet attention to that, but power optimizations are in my to-do list.
Also you have the power to put dedicated hardware inside the FPGA, thus saving on external components (and in some scenarios even power).
Álvaro