Does your application need the whole 16bit or 32bit range? You could perhaps calculate 256 values covering your application's range and then perhaps use that. In your 10us control loop, your speed isn't going to change by more than one bracket up or down in the LUT.