I should like to calculate the square root of a 32 bit integer (long int) in a faster way if possible. I have tried to do it via the implemented floating point sqrt(), and it require about 45 us calculation time with my Arduino Nano V3.
Second question. When you have a statement with a multiplication of two integers, you can get overflow. But can you in some way tell the compiler to change the result into a long integer without making the multiplication on two long integers? I tried to make this test code:
/*
This code is intended to test calculation speed of some aritmetric operations
and perhaps more. It may be done by using a testpin to be measured by oscilloscope
*/
#include "avdweb_AnalogReadFast.h"
const uint8_t TestOutPin = 13; // Test pin for signal to oscilloscope
const uint8_t TestOutPin2 = 5; // Test pin for to test speed
const uint8_t AnalogPotPin = A0; // Analog input pin that the potentiometer is attached to
const unsigned long int PrintPeriodMs = 500;
const int Max10BitP1 = 1023 + 1;
const int Max9Bit = 511;
unsigned long int printTrigMillis = 0;
int i1 = 160;
int i2 = 170;
long int li1 = 180;
long int li2 = 190;
long int li3 = 200;
long int li4 = 210;
float r1=0;
float r2=0;
void setTestPin() { //Uses testpin 13
PORTB |= 0B00100000;
};
void clearTestPin() { //Uses testpin 13
PORTB &= 0B11011111;
};
void setup() {
Serial.begin(9600);
pinMode(TestOutPin, OUTPUT);
pinMode(TestOutPin2, OUTPUT);
digitalWrite(TestOutPin,LOW);
pinMode(AnalogPotPin, INPUT);
printTrigMillis = millis() + PrintPeriodMs;
}
void loop() {
// put your main code here, to run repeatedly:
i1 = random(Max10BitP1) - Max9Bit;
i2 = random(Max10BitP1) - Max9Bit;
setTestPin();
//li1= long(i1); li2= long(i2); li3 = li1*li1 + li2*li2 - li1*li2; // Do work with about 10 us calculation time
li3 = (long) i1*i1 + (long) i2*i2 - (long) i1*i2; // Do work with about 10 us calculation time
//li3 = (long) i1*i1 + i2*i2 - i1*i2; // Cause error. The (long) directive seems only to work on first calculation
//r1 = i1; r2 = i2; li3 = r1*r1 + r2*r2 - r1*r2; // Do work but calculation time is about 55 us
//li3 = i1*i1 + i2*i2 - i1*i2; // cause overflow error
//li3 = long(i1*i1) + long(i2*i2) - long(i1*i2); // cause overflow error
//li4 = sqrt(li3); //This floating point calculation require about 45 us calculation time
clearTestPin();
delay(1);
if (millis() > printTrigMillis) {
printTrigMillis += PrintPeriodMs;
// print the results to the Serial Monitor:
Serial.print("i1 = "); Serial.print(i1);
Serial.print("\t i2 = "); Serial.print(i2);
Serial.print("\t li3 = "); Serial.print(li3);
Serial.print("\t li4 = "); Serial.print(li4);
Serial.println();
};
}
So this seems to be a too long calculation-time.
What is the longest calculationtime in microseconds that is acceptable for you?
5 µs?
10µS?
0,5 µS?
what is the final purpose of this calculation?
It might be possible to not calculate the square-root but use the radicand itself
depending on the final purpose
have you considered using a faster microcontroller like
a ESP8266 32 bit 160 MHz
a ESP32 32 bit 240 MHz
or
a seeeduino XIAO SAMD21 32bit 48 MHz
?
This is part of a small feasibility study on using a micro brushless motor as an encoder to obtain a rotational speed signal in an alternative way. With a digital encoder, you cannot obtain a speed signal in a fast way at very low RPM values.
I am still new to using microcontrollers, so I have not considered much yet. But I shall take a look at your proposals. Yes, it might be right to look for a faster microcontroller with a build in floating point processor. For a couple of month ago, I got some electronics from China, and in there I got a small cheap PCB with a STM32G030 controller. I think it is faster, but I have not tried it yet. I have noticed good reviews regarding the ADC and I have often seen this kind of processer in electrical motor applications.
If a price of $40 is acceptable a teensy 4.0 with 32bit 600 MHz
Specifications
Feature
Teensy 4.1
Teensy 4.0
Ethernet
10 / 100 Mbit
(6 pins)
-none-
USB Host
5 Pins with
power management
2 SMT Pads
SDIO (4 bit data)
Micro SD Socket
8 SMT Pads
PWM Pins
35
31
Analog Inputs
18
14
Serial Ports
8
7
Flash Memory
8 Mbyte
2 Mbyte
QSPI Memory
2 chips +
Program Memory
Program memory
Breadboard I/O
42
24
Bottom SMT Pads
7
16
SD Card Signals
6
0
Total I/O Pins
55
40
Differences between Teensy 4.1 & Teensy 4.0
ARM Cortex-M7 at 600 MHz
Float point math unit , 64 & 32 bits
Programmable FlexIO
Peripheral cross triggering
regarding calculating speed at low rpm
optical encoders are available with - you are reading right - 100.000 pulses per rotation
another option would be a magnetical encoder like this https://ams.com/angle-position-on-axis
long floorSqrt(long x)
{
// Base cases
if (x == 0 || x == 1)
return x;
//check for largest integer square that will fit in a signed long
if (x >= (46340L * 46340L))
return 46340L;
// Do Binary Search for floor(sqrt(x))
long start = 1L;
long end = 46340L;
long ans;
while (start <= end) {
long mid = (start + end) / 2;
// If x is a perfect square
long sqr = mid * mid;
if (sqr == x)
return mid;
// Since we need floor, we update answer when
// mid*mid is smaller than x, and move closer to
// sqrt(x)
if (sqr <= x) {
start = mid + 1;
ans = mid;
}
else // If mid*mid is greater than x
end = mid - 1;
}
return ans;
}
No. Low cost is important for the application. So I like to look for cheaper possibilities.
Interesting. But this many pulses would be expensive - or? I just noticed, that encoders with about 1000 pulses per rotation was rather expensive, but perhaps I looked the wrong places.
I look for solutions for a wide speed range for 30 to 20.000 rpm. But it might be less wide dependent of what is possible in a reasonable way. With feed back from a brushed DC motor this is actual possible, but the brushes wear and you get other problems. With 20.000 rpm many lines on an encoder do also start to cause frequency problems.
Yes, this is an interesting option. I did read a bit about it yesterday, and noticed some users reported that the signal got some significant noise. But I'am not sure about this yet. If you want to sample these two axis hall element sensor at about 2 kHz, then the angle shift for low rpm will be low, and then maybe noise will disturb a velocity signal. But, it would be nice to find out more about this way to do rotary encoding. Have you seen similar sensors, that output two analog signals instead of this digital serial IIC?
I should like to go down to 30 rpm, but 120 rpm may be sufficient. The upper speed is 20.000 rpm. No, I think that the low speed actually needs the fast response time. At higher speeds you got the moment of inertia that will smooth relative speed variations.
Have you tried doing the casting in the static C++ way instead of the C-style way? It makes less fuzz about size checking and maybe it'll give you the results you expect.
If you want to try it just replace the (long)variable_name
with static_cast<long>(variable_name)
I need to use the speed signal for a feed back loop. Therefore, I need to get the square root. I do not know what might be possible yet, but I guess the wanted/needed accuracy would be within +/- 3 %. With a brushed DC motor used to measure speed, I get about +/- 6 %, so I hope for a better result than that.
Thanks for the link. Yes, perhaps some of these ways may work.
What's the frequency and precision of your planned feedback loop and how does it compare to the frequency of the motor-phasing or the time constant of the motor-inertia system? I can't imagine a 10us impulse having much of an effect on an acceleration of a motor.
void setup() {
Serial.begin(115200);
//Serial.println((julery_isqrt(1000000000)));
}
void loop() {
uint32_t x = random();
uint32_t rx = julery_isqrt(x);
int32_t err = rx * rx - x;
Serial.print(x);
Serial.print(" -> ");
Serial.print(rx);
Serial.print(" ^2 = ");
Serial.print(rx * rx);
Serial.print(" error=");
Serial.print(err);
Serial.print(" pct:");
Serial.print((err * 100.0) / x, 4);
Serial.println("%");
delay(1000);
}
/* by Jim Ulery per */
static unsigned julery_isqrt(unsigned long val) {
unsigned long temp, g = 0, b = 0x8000, bshft = 15;
do {
if (val >= (temp = (((g << 1) + b) << bshft--))) {
g += b;
val -= temp;
}
} while (b >>= 1);
return g;
}
Looks like it is value dependent and far less than 1% for most of the uint32_t range. (and integer square root accuracy is limited by the integerness of the values...is sqrt(72)=8 ok at 12% ?)
Thanks for your remarks. I like to get back about this topic a bit later in another thread, that is more about the encoders and application. The aim of this thread was more about the mathematics and finding out about this way may be possible and the restrictions.
No. Low cost is important for the application. So I like to look for cheaper possibilities.
I would try it with a ESP8266 32 bit or ESP32 32 bit, even if they are too costly for the final goal. It would be better to first prove your concept then worry about using a cheaper processor board.