Yeah, I was worried about the value of the operands making a difference... but not worried enough to spend a lot of time on it. I went ahead and got some more accurate results (by looping 65535 times in most cases). I tried to have the operand change for each calculation as well as I could without adding time, so perhaps it gives a good "average" result. Although + and * seemed quite independent of operand values. Anyway, check out the attached image for the new results. And I did subtract out the time that I estimated the looping to take (I guessed 250 ns for each iter).
The surprising result here is that int8 division is almost as slow as int16! I wonder if int8 division is not really implemented, and it just executes as int16? Float division is definitely quite a bit faster than int32. And I'm not sure if the differences in sqrt() are from differences in operand values or from converting to float (because I suspect sqrt() is only actually implemented for float).
Oh, by the way, these times are in ns, not us (typo on top of image file).
