Do some faster integer square root function exist?

I should like to calculate the square root of a 32 bit integer (long int) in a faster way if possible. I have tried to do it via the implemented floating point sqrt(), and it require about 45 us calculation time with my Arduino Nano V3.

Second question. When you have a statement with a multiplication of two integers, you can get overflow. But can you in some way tell the compiler to change the result into a long integer without making the multiplication on two long integers? I tried to make this test code:

/*
This code is intended to test calculation speed of some aritmetric operations
and perhaps more. It may be done by using a testpin to be measured by oscilloscope

*/
#include "avdweb_AnalogReadFast.h"

const uint8_t TestOutPin = 13;         // Test pin for signal to oscilloscope
const uint8_t TestOutPin2 = 5;         // Test pin for to test speed
const uint8_t AnalogPotPin = A0;        // Analog input pin that the potentiometer is attached to

const unsigned long int PrintPeriodMs = 500;
const int Max10BitP1 = 1023 + 1;
const int Max9Bit = 511;

unsigned long int printTrigMillis = 0;
int i1 = 160;
int i2 = 170;
long int li1 = 180;
long int li2 = 190;
long int li3 = 200;
long int li4 = 210;
float r1=0;
float r2=0;

void setTestPin() {       //Uses testpin 13
  PORTB |= 0B00100000;
  };

void clearTestPin() {     //Uses testpin 13
  PORTB &= 0B11011111;
  };

void setup() {
  Serial.begin(9600);
  pinMode(TestOutPin, OUTPUT);
  pinMode(TestOutPin2, OUTPUT);
  digitalWrite(TestOutPin,LOW);
  pinMode(AnalogPotPin, INPUT);
  printTrigMillis = millis() + PrintPeriodMs;
}

void loop() {
  // put your main code here, to run repeatedly:
  i1 = random(Max10BitP1) - Max9Bit;
  i2 = random(Max10BitP1) - Max9Bit;
  setTestPin();
  //li1= long(i1); li2= long(i2); li3 = li1*li1 + li2*li2 - li1*li2;  // Do work with about 10 us calculation time
  
  li3 = (long) i1*i1 + (long) i2*i2 - (long) i1*i2; // Do work with about 10 us calculation time

  //li3 = (long) i1*i1 + i2*i2 - i1*i2; // Cause error. The (long) directive seems only to work on first calculation
  //r1 = i1; r2 = i2; li3 = r1*r1 + r2*r2 - r1*r2;  // Do work but calculation time is about 55 us
  //li3 = i1*i1 + i2*i2 - i1*i2;  // cause overflow error
  //li3 = long(i1*i1) + long(i2*i2) - long(i1*i2);  // cause overflow error

  //li4 = sqrt(li3);  //This floating point calculation require about 45 us calculation time

  clearTestPin();
  
  delay(1);
  if (millis() > printTrigMillis) {
    printTrigMillis += PrintPeriodMs;
    // print the results to the Serial Monitor:
    Serial.print("i1 = "); Serial.print(i1);
    Serial.print("\t i2 = "); Serial.print(i2);
    Serial.print("\t li3 = "); Serial.print(li3);
    Serial.print("\t li4 = "); Serial.print(li4);
    Serial.println();
    };
}

So this seems to be a too long calculation-time.
What is the longest calculationtime in microseconds that is acceptable for you?
5 µs?
10µS?
0,5 µS?

what is the final purpose of this calculation?
It might be possible to not calculate the square-root but use the radicand itself
depending on the final purpose

have you considered using a faster microcontroller like
a ESP8266 32 bit 160 MHz
a ESP32 32 bit 240 MHz
or
a seeeduino XIAO SAMD21 32bit 48 MHz
?

Thanks for your reply.

I think 5 or 10 us would be sufficient.

This is part of a small feasibility study on using a micro brushless motor as an encoder to obtain a rotational speed signal in an alternative way. With a digital encoder, you cannot obtain a speed signal in a fast way at very low RPM values.

https://forum.arduino.cc/t/alternative-to-digital-rotary-encoder/1225470/10

I am still new to using microcontrollers, so I have not considered much yet. But I shall take a look at your proposals. Yes, it might be right to look for a faster microcontroller with a build in floating point processor. For a couple of month ago, I got some electronics from China, and in there I got a small cheap PCB with a STM32G030 controller. I think it is faster, but I have not tried it yet. I have noticed good reviews regarding the ADC and I have often seen this kind of processer in electrical motor applications.

If a price of $40 is acceptable a teensy 4.0 with 32bit 600 MHz

Specifications

Feature Teensy 4.1 Teensy 4.0
Ethernet 10 / 100 Mbit
(6 pins) -none-
USB Host 5 Pins with
power management 2 SMT Pads
SDIO (4 bit data) Micro SD Socket 8 SMT Pads
PWM Pins 35 31
Analog Inputs 18 14
Serial Ports 8 7
Flash Memory 8 Mbyte 2 Mbyte
QSPI Memory 2 chips +
Program Memory Program memory
Breadboard I/O 42 24
Bottom SMT Pads 7 16
SD Card Signals 6 0
Total I/O Pins 55 40
Differences between Teensy 4.1 & Teensy 4.0
  • ARM Cortex-M7 at 600 MHz
  • Float point math unit , 64 & 32 bits
  • Programmable FlexIO
  • Peripheral cross triggering

regarding calculating speed at low rpm
optical encoders are available with - you are reading right - 100.000 pulses per rotation
another option would be a magnetical encoder like this
https://ams.com/angle-position-on-axis

https://de.aliexpress.com/item/1005001621701959.html

1 Like

how low?

can you use different approaches for low and higher speeds? do you have more time for the calculation at "low" speed?

Try this to see how long it takes. This is a modified version of the code I found on this webpage https://www.geeksforgeeks.org/square-root-of-an-integer/

long floorSqrt(long x)
{
  // Base cases
  if (x == 0 || x == 1)
    return x;
  //check for largest integer square that will fit in a signed long
  if (x >= (46340L * 46340L))
    return 46340L;

  // Do Binary Search for floor(sqrt(x))
  long start = 1L;
  long end = 46340L;
  long ans;
  while (start <= end) {
    long mid = (start + end) / 2;

    // If x is a perfect square
    long sqr = mid * mid;
    if (sqr == x)
      return mid;

    // Since we need floor, we update answer when
    // mid*mid is smaller than x, and move closer to
    // sqrt(x)

    if (sqr <= x) {
      start = mid + 1;
      ans = mid;
    }
    else // If mid*mid is greater than x
      end = mid - 1;
  }
  return ans;
}


(I count 3 phases, 9 stator poles, and 12 magnets on an image from the motor from that post. (bldc anatomy) Is that 9*2=18 pulses/rev?)

Then (sq(v1) + sq(v2) -v1*v2) will also be close to a constant.

What RPM and accuracy do you need? Maybe an approximation would be good enough:

No. Low cost is important for the application. So I like to look for cheaper possibilities.

Interesting. But this many pulses would be expensive - or? I just noticed, that encoders with about 1000 pulses per rotation was rather expensive, but perhaps I looked the wrong places.

I look for solutions for a wide speed range for 30 to 20.000 rpm. But it might be less wide dependent of what is possible in a reasonable way. With feed back from a brushed DC motor this is actual possible, but the brushes wear and you get other problems. With 20.000 rpm many lines on an encoder do also start to cause frequency problems.

Yes, this is an interesting option. I did read a bit about it yesterday, and noticed some users reported that the signal got some significant noise. But I'am not sure about this yet. If you want to sample these two axis hall element sensor at about 2 kHz, then the angle shift for low rpm will be low, and then maybe noise will disturb a velocity signal. But, it would be nice to find out more about this way to do rotary encoding. Have you seen similar sensors, that output two analog signals instead of this digital serial IIC?

I should like to go down to 30 rpm, but 120 rpm may be sufficient. The upper speed is 20.000 rpm. No, I think that the low speed actually needs the fast response time. At higher speeds you got the moment of inertia that will smooth relative speed variations.

Hi @backflip ,

Have you tried doing the casting in the static C++ way instead of the C-style way? It makes less fuzz about size checking and maybe it'll give you the results you expect.
If you want to try it just replace the
(long)variable_name
with
static_cast<long>(variable_name)

Good Luck
Gaby.//

Thanks. I will try this code out and get back.

30 rpm means 0.5 rotations per second.

20.000 rpm means 333.33 rotations per second.

That is a very foggy specification a foggication so to say :wink:

How often per second do you want to sample the rpm?

And you still haven't told anything about the complete application.

With knowing the complete application it will be much easier to make suggestions.

a eight spoke wheel and a fork light-barrier would deliver 8 pulses per rotation
at 30 rpm this is 4 pulses per second = 1 pulse each 0.25 seconds

For low speeds you measure the time between two pulses and then can calculate rpm

On high rpms you measure pulses per second or pulses per 0,1 seconds

Anything in the range of 1kHz to 5 kHz should be easy to measure with an arduino nano

No need to try, my testing indicates it is much, much slower than using the sqrt() function with floats.

I need to use the speed signal for a feed back loop. Therefore, I need to get the square root. I do not know what might be possible yet, but I guess the wanted/needed accuracy would be within +/- 3 %. With a brushed DC motor used to measure speed, I get about +/- 6 %, so I hope for a better result than that.

Thanks for the link. Yes, perhaps some of these ways may work.

and how many encoder tics/rev and how often does your controller perform an update? is this the 2 kHz you mentioned?

Thanks. I got some difficulty in getting the syntax right. This is the statement:

li3 = i1*i1 + i2*i2 - i1*i2;

This did not work:

li3 = static_cast<long>(i1*i1) +  static_cast<long>(i2*i2) - static_cast<long>(i1*i2);

What's the frequency and precision of your planned feedback loop and how does it compare to the frequency of the motor-phasing or the time constant of the motor-inertia system? I can't imagine a 10us impulse having much of an effect on an acceleration of a motor.

void setup() {
  Serial.begin(115200);
  //Serial.println((julery_isqrt(1000000000)));
}

void loop() {
  uint32_t x = random();
  uint32_t rx = julery_isqrt(x);
  int32_t err = rx * rx - x;
  Serial.print(x);
  Serial.print(" -> ");
  Serial.print(rx);
  Serial.print(" ^2 = ");
  Serial.print(rx * rx);
  Serial.print(" error=");
  Serial.print(err);
  Serial.print(" pct:");
  Serial.print((err * 100.0) / x, 4);
  Serial.println("%");
  delay(1000);
}

/* by Jim Ulery per */
static unsigned julery_isqrt(unsigned long val) {
  unsigned long temp, g = 0, b = 0x8000, bshft = 15;
  do {
    if (val >= (temp = (((g << 1) + b) << bshft--))) {
      g += b;
      val -= temp;
    }
  } while (b >>= 1);
  return g;
}

gives:

16807 -> 129 ^2 = 16641 error=-166 pct:-0.9877%
282475249 -> 16807 ^2 = 282475249 error=0 pct:0.0000%
1622650073 -> 40282 ^2 = 1622639524 error=-10549 pct:-0.0007%
984943658 -> 31383 ^2 = 984892689 error=-50969 pct:-0.0052%
1144108930 -> 33824 ^2 = 1144062976 error=-45954 pct:-0.0040%
470211272 -> 21684 ^2 = 470195856 error=-15416 pct:-0.0033%
101027544 -> 10051 ^2 = 101022601 error=-4943 pct:-0.0049%
1457850878 -> 38181 ^2 = 1457788761 error=-62117 pct:-0.0043%
1458777923 -> 38193 ^2 = 1458705249 error=-72674 pct:-0.0050%
2007237709 -> 44802 ^2 = 2007219204 error=-18505 pct:-0.0009%
823564440 -> 28697 ^2 = 823517809 error=-46631 pct:-0.0057%
1115438165 -> 33398 ^2 = 1115426404 error=-11761 pct:-0.0011%
1784484492 -> 42243 ^2 = 1784471049 error=-13443 pct:-0.0008%
74243042 -> 8616 ^2 = 74235456 error=-7586 pct:-0.0102%
114807987 -> 10714 ^2 = 114789796 error=-18191 pct:-0.0158%
1137522503 -> 33727 ^2 = 1137510529 error=-11974 pct:-0.0011%
1441282327 -> 37964 ^2 = 1441265296 error=-17031 pct:-0.0012%
16531729 -> 4065 ^2 = 16524225 error=-7504 pct:-0.0454%
823378840 -> 28694 ^2 = 823345636 error=-33204 pct:-0.0040%
143542612 -> 11980 ^2 = 143520400 error=-22212 pct:-0.0155%
896544303 -> 29942 ^2 = 896523364 error=-20939 pct:-0.0023%
1474833169 -> 38403 ^2 = 1474790409 error=-42760 pct:-0.0029%
1264817709 -> 35564 ^2 = 1264798096 error=-19613 pct:-0.0016%
1998097157 -> 44700 ^2 = 1998090000 error=-7157 pct:-0.0004%
1817129560 -> 42627 ^2 = 1817061129 error=-68431 pct:-0.0038%

Looks like it is value dependent and far less than 1% for most of the uint32_t range. (and integer square root accuracy is limited by the integerness of the values...is sqrt(72)=8 ok at 12% ?)

Thanks for your remarks. I like to get back about this topic a bit later in another thread, that is more about the encoders and application. The aim of this thread was more about the mathematics and finding out about this way may be possible and the restrictions.

Thanks. I just tried this code in my set-up. I measured a calculation time of about 85 us, so it actually is slower than using the float calculation.

No. Low cost is important for the application. So I like to look for cheaper possibilities.

I would try it with a ESP8266 32 bit or ESP32 32 bit, even if they are too costly for the final goal. It would be better to first prove your concept then worry about using a cheaper processor board.

1 Like