Computation takes a long time

Here is the code that I have. It takes around 10 seconds to execute. Is there any way I can reduce the execution time? If anyone is wondering what I’m doing, I’m using 2 digital potentiometers to set the output voltage for a LM317 voltage regulator.

int resistorStep[2];
float distance = 100;
float number = 3.12;
unsigned long time;
float numTimes = 0;

void setup()
{
Serial.begin(9600);
Serial.println(“Calculating…”);
time = millis();
for (int r1 = 0; r1 <= 255; r1++)
{
for (int r2 = 0; r2 <= 255; r2++)
{
float r1v = round((10000.0)((float)r1)/256.0+52.0);
float r2v = round((10000.0)
((float)r2)/256.0+52.0);
float voltage = 1.25*(1.0+r2v/r1v)+.00005r2v;
float voltageDifference = voltage - number;
float absVoltageDifference = voltageDifference;
if (absVoltageDifference < 0) absVoltageDifference = absVoltageDifference
(-1);
float absDistance = distance;
if (absDistance < 0) absDistance = absDistance*(-1);
if (absDistance - absVoltageDifference > 0)
{
distance = voltageDifference;
resistorStep[0] = r1;
resistorStep[1] = r2;
}
}
}
Serial.print(“Computation time: “);
Serial.print(millis() - time);
Serial.println(” ms”);
Serial.print("Distance is ");
Serial.println(distance);
Serial.print("When \n r1 = “);
Serial.print(resistorStep[0]);
Serial.print(” and\n r2 = ");
Serial.println(resistorStep[1]);
}

void loop()
{
}

Your inner block that is doing the floating point math calculations is executing 256*256 = 65536 times.

If it takes ten seconds, that corresponds to about 153usec to do the calculation once.

At 16MHz, that corresponds to about 2448 clock cycles per calculation.

If you convert all your calcs to integer math, I would think you will se a dramatic improvement over this, guessing somewhere between a 10-100x speedup. Floating point math is relatively expensive, as it done in software libs. Integer math is mostly done in hardware, at 1 or 2 clock cycles per operation.

Declare all your variables, such as r1v and r2v, outside the for loops. Every time you enter the inner for loop the code has to create space for those variables on the stack and then pop them off at the end of the loop. Then immediately pop space back on again the next time through. That should speed it up a bit.

Pete

pico:
Your inner block that is doing the floating point math calculations is executing 256*256 = 65536 times.

If it takes ten seconds, that corresponds to about 153usec to do the calculation once.

At 16MHz, that corresponds to about 2448 clock cycles per calculation.

If you convert all your calcs to integer math, I would think you will se a dramatic improvement over this, guessing somewhere between a 10-100x speedup. Floating point math is relatively expensive, as it done in software libs. Integer math is mostly done in hardware, at 1 or 2 clock cycles per operation.

Is long math the same as integer math?

el_supremo: Declare all your variables, such as r1v and r2v, outside the for loops. Every time you enter the inner for loop the code has to create space for those variables on the stack and then pop them off at the end of the loop. Then immediately pop space back on again the next time through. That should speed it up a bit.

Pete

While the AVR does have an option to fold all of the inner variables into the stack frame (-maccumulate-args) which avoids pushing and popping local variables declared in blocks, I suspect that in the grand scheme of things this is in the noise level.

As pico says, the inner loop is all floating point, including several divisions, and given the AVR does not have floating point instructions, it has to emulate these in software.

Long math (32-bits) takes more time than 16-bit int math, however, it should take way less than floating point math.

MichaelMeissner:

el_supremo:
Declare all your variables, such as r1v and r2v, outside the for loops. Every time you enter the inner for loop the code has to create space for those variables on the stack and then pop them off at the end of the loop. Then immediately pop space back on again the next time through.
That should speed it up a bit.

Pete

While the AVR does have an option to fold all of the inner variables into the stack frame (-maccumulate-args) which avoids pushing and popping local variables declared in blocks, I suspect that in the grand scheme of things this is in the noise level.

As pico says, the inner loop is all floating point, including several divisions, and given the AVR does not have floating point instructions, it has to emulate these in software.

Long math (32-bits) takes more time than 16-bit int math, however, it should take way less than floating point math.

I’ve completely converted the code (and even lost accuracy) and it still has a noticeable delay. Around 5-7 seconds.

It takes a long time, indeed. You’re asking a lot of an 8-bit microcontroller - about a dozen floating point operations per iteration, and about a quarter of those are divisions. And you ask it more than 65,000 times in a row. Jeepers.

You can make it run faster by switching to integer math wherever it works. Just changing this:

      float r1v = round((10000.0)*((float)r1)/256.0+52.0);
      float r2v = round((10000.0)*((float)r2)/256.0+52.0);
      float voltage = 1.25*(1.0+r2v/r1v)+.00005*r2v;

to this:

      long r1v = 10000L*r1+(256*52);
      float r2v = 10000L*r2+(256*52);
      float voltage = 1.25*(1.0+r2v/r1v)+(0.00005/256.0)*r2v;

gives an improvement from 10.3 seconds to 6.07, as reported by millis() - 40% less time. Several floating point operations are replaced by fewer long integer operations. I believe that the compiler recognizes the expressions (256*52)" and “(0.00005/256.0)” as constants, and replaces them with their calculated values, rather than recalculating them every time through the loop. I don’t see that accuracy is compromised at all.

To calculate the output voltage for an LM317, though, you have to divide something by something - there’s just not another way to get the ratio between two resistors. The offset contributed by the wiper resistance keeps you from setting the inverse of one step as a constant, adding it up, and replacing a divide with a multiply. Division is expensive, computationally, and you’re paying for it with several seconds of waiting. It takes LibreOffice Calc about one second to solve this by brute force, with an i5 at 2.67GHz. It takes so long because it’s a lot to do.

If I’m not wrong, you’re calculating an output voltage for each pair of possible resistances in two 10k digital pots, each with a wiper resistance of 52 ohms. You’re using values for R1 that are well above the example values, for this regulator, of 120-240 ohms. If you do use a resistance that won’t draw about 5 mA, you’ll need to make sure that you keep the load current above the stated minimum in the datasheet; otherwise, you won’t get good regulation. I’m can’t find any application notes that show a much higher resistor as R1. The datasheet does say that IR1 can be different from 5mA, but it doesn’t say how different. I’m interested to know if it works.

tmd3:
It takes a long time, indeed. You’re asking a lot of an 8-bit microcontroller - about a dozen floating point operations per iteration, and about a quarter of those are divisions. And you ask it more than 65,000 times in a row. Jeepers.

You can make it run faster by switching to integer math wherever it works. Just changing this:

      float r1v = round((10000.0)*((float)r1)/256.0+52.0);

float r2v = round((10000.0)((float)r2)/256.0+52.0);
      float voltage = 1.25
(1.0+r2v/r1v)+.00005*r2v;


to this:


long r1v = 10000Lr1+(25652);
      float r2v = 10000Lr2+(25652);
      float voltage = 1.25*(1.0+r2v/r1v)+(0.00005/256.0)*r2v;


gives an improvement from 10.3 seconds to 6.07, as reported by millis() - 40% less time. Several floating point operations are replaced by fewer long integer operations. I believe that the compiler recognizes the expressions (256*52)" and "(0.00005/256.0)" as constants, and replaces them with their calculated values, rather than recalculating them every time through the loop. I don't see that accuracy is compromised at all. 

To calculate the output voltage for an LM317, though, you have to divide something by something - there's just not another way to get the ratio between two resistors. The offset contributed by the wiper resistance keeps you from setting the inverse of one step as a constant, adding it up, and replacing a divide with a multiply. Division is expensive, computationally, and you're paying for it with several seconds of waiting. It takes LibreOffice Calc about one second to solve this by brute force, with an i5 at 2.67GHz. It takes so long because it's a lot to do.

If I'm not wrong, you're calculating an output voltage for each pair of possible resistances in two 10k digital pots, each with a wiper resistance of 52 ohms. You're using values for R1 that are well above the example values, for this regulator, of 120-240 ohms. If you do use a resistance that won't draw about 5 mA, you'll need to make sure that you keep the load current above the stated minimum in the datasheet; otherwise, you won't get good regulation. I'm can't find any application notes that show a much higher resistor as R1. The datasheet does say that I<sub>R1</sub> can be different from 5mA, but it doesn't say how different. I'm interested to know if it works.

It’s work a try. The worst that could happen is me loosing the $.30 LM317, which isn’t a big deal. Instead of making the processor find the correct values, I am just going to write a PHP script to do it, and then input the values as variables. Maybe I’ll even store them on an SD card and use that. What do modern day workbench DC power supplies use?

for integer math performing divisions by 2, 4, 8, 16, 32, ...256 etc.. you might try to replace it by doing bit shifts

also i'm not exactly sure what you try to do here, but is it required to the whole calculation so many times?. seams like some overhead for a search routine, which is not optimized (like checking if every number is a prime) also if it is a search thing you might also use exit functions if something gets found, exit the loop and be done with it.

float r1v = round((10000.0)*((float)r1)/256.0+52.0);

doesnt need to be calculated for every cycle of r2 (place it before the second fore next loop)

if (absVoltageDifference < 0) {
           absVoltageDifference = -absVoltageDifference;
          }

is easier to read than

if (absVoltageDifference < 0) absVoltageDifference = absVoltageDifference*(-1);

IMO, and may be quicker.

“abs()” is even easier to read.

A very simple optimisation, which also gives you some idea of the difference between an fp addition and an fp multiplication:

  float r01 = 0.0;
  for (int r1 = 0; r1 <= 255; r1++, r01 += 10000.0)
  {
    float r1v = round(r01/256.0+52.0);
    float r02 = 0.0;
    for (int r2 = 0; r2 <= 255; r2++, r02 += 10000.0)
    {
      float r2v = round(r02/256.0+52.0);

There’s also a small but significant saving to be made calculating the absolute value of “distance” only when it changes, and not 65536 times.

Exactly what are you trying to compute? There may be a much better way of doing it that exhaustive search.