Optimisation is dependent on the implementation of the ALU (Arithmetic and Logic Unit) of the microprocessor.
And the effectiveness of the optimiser in the compiler.
Certain optimisations work better with certain cores, for example:
Where multiplication and divide take many clock cycles (typical), where possible using shift operations to prevent underflow and overflow to rescale can under certain circumstances significantly speed up the process.
So instead of:
word i = 25000;
i /= 256;
You would use
i >> 8;
There is a secondary benefit to this in these types of microcontrollers in that the number of clock cycles taken by the ALU to complete the operate is constant, which is very useful in real time control systems which this system is sometimes used in.
In PC systems the above is true that easier to understand and maintain code is preferable.
In embedded systems and more specifically real time, performance is everything, if you cannot match the required frequency response of the system, then it doesn't work. I'm not saying bad code is acceptable, but the first priority is for the product to work!
Ram and flash requirements are also considerations.
There are some options, increasing clock rate of the processor or changing the processor itself. But these are not always a possibility and software optimisation is typically your only route.
Secondly, readability of code is subjective to the user, those that are familiar with specific algorithms will not suffer from a misunderstanding of operation and so there is certainly an element of knowledge involved.
Handling of datatypes larger than the ALU core operation can also be a field of optimisation.
The 8-bit core of the atmel can only add, sutract, multiply etc 2 8-bit values at any one time so complex algorithms need to be used to support the operations for values greater than 8bit.
Reduction in the use of these types can offer run-time performance and ram usage benefits.
The Atmel also doesn't have an FPU (Floating Point Unit) these also require extra complexity in the mathematical operation over and above the fixed point operations.
Where possible avoid the use of floating point math when an FPU is not available and a fixed point operation will provide sufficient accuracy in the calculation.