Make code more efficient

Hi,

I am using the following Auto-correlation code which takes about 100mS to run. I would like it to be more efficient. Perhaps a shorter LENGTH would be accurate enough but when I reduce it to 128, it stops auto-correlating after a few seconds. The code still runs but the freq stops changing. Anyway, whether its a solution make a short sample length work, or some efficiency in the central for loop, if I could get some direction, that would be great. thanks.

#define LENGTH 256
#define NO 0
#define YES 1

byte rawData[LENGTH];
byte prev;
byte max;
int count;

// Sample Frequency in kHz
const float sample_freq = 8919;

int len = sizeof(rawData);
int i,k;
long sum, sum_old;
int thresh = 0;
float freq_per = 0;
byte pd_state = 0;

void setup(){
 
  analogReference(DEFAULT);   // Connect to 3.3V
  analogRead(A0);
  Serial.begin(115200);
  count = 0;
 
}


void loop(){

      if (count < LENGTH) {
         count++;
        rawData[count] = analogRead(A0)>>2; 
      }
    
      else 
      {
          sum = 0;
          pd_state = 0;
          int period = 0;
          for(i=0; i < len; i++)
          {
              // Autocorrelation
              sum_old = sum;
              sum = 0;
              for(k=0; k < len-i; k++) sum += (rawData[k]-128)*(rawData[k+i]-128)/256;
                               
              // Peak Detect State Machine
              if (pd_state == 2 && (sum-sum_old) <=0) 
              {
                period = i;
                pd_state = 3;
              }
              if (pd_state == 1 && (sum > thresh) && (sum-sum_old) > 0) pd_state = 2;
              if (!i) {
                thresh = sum * 0.5;
                pd_state = 1;
              }
          }
        
          if (thresh >100 && period > 0) {
            freq_per = sample_freq/period;
          
            Serial.println(freq_per);
            analogWrite(9, freq_per); 
        
          count = 0;
          }
      }
}

when I reduce it to 128, it stops auto-correlating after a few seconds

  1. Post the code that you think does not behave correctly, not the code that works (but see below).

  2. Describe what should happen, what actually happens and why you think this is a problem.

"The freq stops changing" is a meaningless statement without more information, like what input is present, and what you think the correct output should be.

Note that the following violates the upper array bound of rawData[], and rawData[0] is not used.

      if (count < LENGTH) {
         count++;
        rawData[count] = analogRead(A0)>>2;
}

Increment count AFTER storing the data.

What board / processor are you using? A quick and dirty approach might be to throw more horsepower at it.

jremington:

  1. Post the code that you think does not behave correctly, not the code that works (but see below).

  2. Describe what should happen, what actually happens and why you think this is a problem.

"The freq stops changing" is a meaningless statement without more information, like what input is present, and what you think the correct output should be.

Note that the following violates the upper array bound of rawData[], and rawData[0] is not used.

      if (count < LENGTH) {

count++;
        rawData[count] = analogRead(A0)>>2;
}



Increment count AFTER storing the data.

The code is the same except LENGTH is 128 instead of 256. 512 works ok, but very slow obviously. 256 works but a little slow. I feel 128 might be ok, but its like perhaps the A/D stops working. As if things are running to quickly somewhere.

Have you fixed that egregious array bound error yet?

Fixed the boundary issue thanks. Doing some debugging, and the problem is that with a LENGTH of 128 the period sometimes falls to zero, a condition becomes false and so 'count', never gets reset and so the A/D stops reading.

Moved 'count' outside of the condition loop, and a LENGTH of 128 works, though not below about 150Hz, which might be normal operation?

Maybe replacing

        thresh = sum * 0.5;

by

        thresh = sum / 2;

speeds it up a bit (or using bitshift right); floating point calculations are slow and and you assign the result back to an int so it's a waste.

Further freq_per can probably be a byte as analogWrite does not take a float as its second argument. In which case sample_freq can be an int instead of a float.

The user here - Reliable Frequency Measurment Using Autocorrelation - Science and Measurement - Arduino Forum

noted some significant changes, but I guess I don't understand it, as when I make the changes the output becomes erratic. They suggested among other things

"made datatype signed char, to remove the -128 from inner loop formula
(preprocess the data array once)
removed the division by 256 from the inner loop"

There certainly was a large speed increase as expected, but the output is not at all accurate and jumps around +-500Hz

What I did was change

byte rawData[LENGTH];

to

unsigned char rawData[LENGTH];

and

for(k=0; k < len-i; k++) sum += (rawData[k]-128)*(rawData[k+i]-128)/256;

to

for(k=0; k < len-i; k++) sum += (rawData[k])*(rawData[k+i]);
sum /=256;

Please show new code completely.

P.S. Can't follow the link now.

The new code is not so straight forward as they have used a data file rather than an A/D input and changed the format dramatically, but the link is true, cut and paste it into the address bar.

https://forum.arduino.cc/index.php?topic=195085.0

I did not say that the link is not working; I said that I could not follow it now (because from a cell phone I have to first select and next paste).

If you would create a clickable link, life would be a lot easier
[url=https://forum.arduino.cc/index.php?topic=195085.0]https://forum.arduino.cc/index.php?topic=195085.0[/url] gives Reliable Frequency Measurment Using Autocorrelation - Science and Measurement - Arduino Forum

You said that you made some changes, so we need to see your code, not something else. And which post in that thread are you referring to?

What I did was change

Code: [Select]

byte rawData[LENGTH];

to
Code: [Select]

unsigned char rawData[LENGTH];

What? :o

Add this to the top of the file:

#pragma GCC optimize("-ffast-math")
#pragma GCC optimize("-O3")

The Arduino optimizes default for minimal code size, but that is so boring. Optimizing for speed is more fun :stuck_out_tongue:
When using "fast-math", the floating point is no longer according tot the IEEE754 standard. But I don't care :wink:

Multiplying (integer and float) is not so bad, but a division is very slow (integer and float) on AVR microcontrollers. That is often 10 times slower than multiplying. Try to avoid divisions.

When an algoritme is used, a better calculation method is almost in all situation a huge leap forward. When I make a guess, I think that goes for 95% of the algoritmes.
The original mathematical formulas that you can find in books and online, are almost always very slow with a Arduino.

@tim77777, @gfvalvo asked what board / processor you use, and @sterretje would like to see the complete new code.

thanks. the speed optimisation took about 10mS off it.

The board is a pro-mini. 5v 16MHz.

I am not using the new code, because I cannot get it to work. Although it is pretty much the same content, it is quite different in the way it is set out and because it includes a .h file with a simulated data sample, the LENGTH part is not included as such.

Here it is.

//    FILE: frequencyDSP.ino
//  AUTHOR: rob tillaart (org version akellyirl)
// VERSION: 0.1.03
// PURPOSE: frequency analysis using DSP
//    DATE: 2013-10-27
//     URL: http://www.instructables.com/id/Reliable-Frequency-Detection-Using-DSP-Techniques

#include "sample.h"

void setup()
{
  Serial.begin(115200);
  Serial.println("FrequencyDSP");
  Serial.println("------------");

  uint32_t start = millis();
  float fr = autoCorrelateFrequency(rawData, sizeof(rawData), 22050);
  uint32_t stop = millis();

  Serial.print("freq:\t");
  Serial.println(fr);
  Serial.print("time:\t");
  Serial.println(stop - start);
}

void loop()
{
}

float autoCorrelateFrequency(char * sample, int len, float sampleFreq)
{
  long sum = 0;
  long sum_old = 0;
  int thresh = 0;
  byte pd_state = 0;
  int period = 0;  // 0 results in inf

  // Autocorrelation
  for (int i=0; i < len && (pd_state != 3); i++)
  {
    sum_old = sum;
    sum = 0;

    for (int k=0; k <len-i; k++)
    {
      sum += (rawData[k]) * (rawData[k+i]);
    }
    sum /= 256;

    // Peak Detect State Machine
    // 0: initial, set threshold
    // 1: detect positive slope
    // 2: detect negative slope
    // 3: peak found
    if (pd_state == 0)
    {
      thresh = sum / 2;
      pd_state = 1;
    }
    else if (pd_state == 1 && (sum > thresh) && (sum - sum_old) > 0)
    {
      pd_state = 2;
    }
    else if (pd_state == 2 && (sum - sum_old) <= 0)
    {
      period = i;
      pd_state = 3;
    }
  }

  return sampleFreq/period;
}

I cannot see how to attach a file here. Here is the sample.h file

char rawData[970] = {
 0x73, 0x7E, 0x3E, 0x39, 0xB0, 0x4D, 0x16, 0x48, 0x7C, 0xAB, 0x95, 0x9F,
 0xA6, 0x47, 0x5E, 0x52, 0x28, 0x92, 0x9C, 0x9A, 0xDC, 0x90, 0x12, 0x7F,
 0xE5, 0x94, 0x81, 0x79, 0x90, 0x5D, 0x48, 0x9B, 0xA9, 0xB5, 0x68, 0x56,
 0x82, 0xA7, 0x5F, 0x74, 0xEB, 0x86, 0x4D, 0xB4, 0xAA, 0xAF, 0x74, 0x46,
 0x99, 0xAD, 0x7D, 0x32, 0xCA, 0x8B, 0x63, 0xBF, 0x4D, 0x76, 0x78, 0x9A,
 0xC9, 0x4D, 0xBB, 0xD9, 0x2E, 0x6E, 0x7B, 0x9B, 0xC4, 0x57, 0x79, 0x7F,
 0x0D, 0x8B, 0xC8, 0x4F, 0xB2, 0x3F, 0x6F, 0xE1, 0x2A, 0x7E, 0xA6, 0x9E,
 0x78, 0x6B, 0x64, 0x24, 0x99, 0x74, 0x1C, 0x3B, 0x6C, 0xA1, 0x9E, 0x96,
 0xAB, 0x63, 0x4F, 0x62, 0x2C, 0x72, 0xA3, 0x92, 0xCA, 0xB2, 0x2E, 0x51,
 0xD5, 0xAD, 0x81, 0x7C, 0x87, 0x75, 0x43, 0x86, 0xA5, 0xB4, 0x82, 0x54,
 0x74, 0xA0, 0x7A, 0x5E, 0xCF, 0xAF, 0x4D, 0x96, 0xB3, 0xA8, 0x8F, 0x47,
 0x82, 0xA9, 0x93, 0x3D, 0x97, 0xB4, 0x57, 0xB1, 0x70, 0x60, 0x82, 0x83,
 0xC9, 0x6F, 0x88, 0xE9, 0x59, 0x51, 0x82, 0x86, 0xC3, 0x77, 0x63, 0x8C,
 0x2B, 0x55, 0xCE, 0x68, 0x91, 0x72, 0x42, 0xDA, 0x5C, 0x56, 0xA7, 0x9C,
 0x8A, 0x66, 0x71, 0x2F, 0x73, 0x90, 0x2E, 0x31, 0x5E, 0x93, 0xA2, 0x94,
 0xA7, 0x7D, 0x4C, 0x64, 0x3B, 0x56, 0x9E, 0x94, 0xB5, 0xC3, 0x54, 0x37,
 0xB5, 0xC1, 0x87, 0x7E, 0x81, 0x82, 0x4C, 0x6F, 0xA0, 0xAF, 0x96, 0x5C,
 0x69, 0x94, 0x8C, 0x5C, 0xAC, 0xC7, 0x61, 0x7A, 0xB3, 0xA6, 0x9D, 0x57,
 0x6B, 0xA2, 0x9E, 0x55, 0x6E, 0xBF, 0x67, 0x95, 0x8F, 0x57, 0x80, 0x7C,
 0xB8, 0x92, 0x6D, 0xD9, 0x89, 0x43, 0x7D, 0x7F, 0xB4, 0x95, 0x5D, 0x88,
 0x4D, 0x37, 0xB8, 0x8B, 0x77, 0x8F, 0x3C, 0xB4, 0x8F, 0x43, 0x99, 0x9D,
 0x93, 0x6B, 0x72, 0x44, 0x55, 0x97, 0x49, 0x2B, 0x52, 0x84, 0xA1, 0x96,
 0xA0, 0x90, 0x54, 0x5F, 0x4C, 0x45, 0x8E, 0x98, 0xA5, 0xC6, 0x79, 0x32,
 0x8E, 0xC9, 0x94, 0x80, 0x7E, 0x85, 0x5C, 0x5E, 0x96, 0xA9, 0xA2, 0x6A,
 0x61, 0x88, 0x95, 0x66, 0x8C, 0xCB, 0x7E, 0x67, 0xA8, 0xA8, 0xA2, 0x6C,
 0x5D, 0x96, 0xA3, 0x6E, 0x59, 0xB3, 0x82, 0x7C, 0x9E, 0x5F, 0x75, 0x7C,
 0xA2, 0xA8, 0x6A, 0xBB, 0xAF, 0x49, 0x6E, 0x7F, 0xA2, 0xA7, 0x65, 0x7B,
 0x68, 0x31, 0x93, 0xA5, 0x6E, 0x94, 0x4E, 0x89, 0xAE, 0x4A, 0x81, 0xA0,
 0x96, 0x76, 0x6F, 0x57, 0x45, 0x8C, 0x66, 0x2E, 0x47, 0x75, 0x9C, 0x99,
 0x9A, 0x9A, 0x64, 0x59, 0x58, 0x41, 0x79, 0x9A, 0x9C, 0xBF, 0x97, 0x3F,
 0x6C, 0xC1, 0xA3, 0x82, 0x7E, 0x84, 0x6C, 0x56, 0x87, 0xA4, 0xA7, 0x7B,
 0x5F, 0x7C, 0x96, 0x74, 0x77, 0xC0, 0x99, 0x63, 0x96, 0xAB, 0xA3, 0x80,
 0x5A, 0x86, 0xA2, 0x82, 0x55, 0x99, 0x9A, 0x72, 0x9C, 0x70, 0x6A, 0x7E,
 0x92, 0xAF, 0x77, 0x9B, 0xC1, 0x60, 0x5F, 0x7E, 0x93, 0xAD, 0x76, 0x70,
 0x77, 0x3B, 0x70, 0xAE, 0x76, 0x8B, 0x67, 0x69, 0xB3, 0x63, 0x6A, 0x9D,
 0x98, 0x81, 0x6E, 0x64, 0x44, 0x78, 0x7A, 0x39, 0x3E, 0x68, 0x92, 0x9C,
 0x97, 0x9C, 0x75, 0x58, 0x5D, 0x46, 0x66, 0x95, 0x98, 0xB4, 0xAA, 0x55,
 0x55, 0xAE, 0xB0, 0x88, 0x7E, 0x82, 0x77, 0x58, 0x77, 0x9E, 0xA7, 0x8A,
 0x63, 0x72, 0x91, 0x81, 0x6F, 0xAB, 0xAD, 0x6C, 0x84, 0xA9, 0xA3, 0x8D,
 0x60, 0x76, 0x9D, 0x8F, 0x5E, 0x80, 0xA4, 0x76, 0x91, 0x82, 0x66, 0x7B,
 0x88, 0xAB, 0x88, 0x86, 0xC0, 0x7E, 0x57, 0x79, 0x89, 0xA9, 0x89, 0x6B,
 0x7B, 0x4D, 0x57, 0xA5, 0x87, 0x80, 0x7A, 0x5C, 0xA5, 0x7F, 0x5D, 0x93,
 0x9A, 0x8A, 0x71, 0x6A, 0x4C, 0x65, 0x83, 0x4C, 0x3A, 0x5C, 0x87, 0x9B,
 0x96, 0x9B, 0x84, 0x5D, 0x5E, 0x4E, 0x58, 0x8B, 0x97, 0xA9, 0xB2, 0x6F,
 0x4C, 0x95, 0xB6, 0x91, 0x80, 0x81, 0x7D, 0x60, 0x6B, 0x94, 0xA5, 0x96,
 0x6C, 0x6B, 0x89, 0x8A, 0x70, 0x96, 0xB4, 0x7C, 0x76, 0xA1, 0xA5, 0x96,
 0x6C, 0x6B, 0x94, 0x97, 0x6C, 0x6F, 0xA2, 0x82, 0x85, 0x8D, 0x69, 0x76,
 0x83, 0xA1, 0x97, 0x7D, 0xB2, 0x98, 0x5A, 0x70, 0x84, 0xA0, 0x97, 0x6F,
 0x78, 0x5F, 0x4B, 0x92, 0x95, 0x7B, 0x82, 0x5F, 0x8F, 0x94, 0x5E, 0x83,
 0x9A, 0x8F, 0x77, 0x6D, 0x56, 0x59, 0x81, 0x5F, 0x3B, 0x52, 0x7B, 0x97,
 0x97, 0x99, 0x8E, 0x66, 0x5D, 0x56, 0x51, 0x7D, 0x96, 0xA0, 0xB2, 0x86,
 0x4F, 0x7C, 0xB3, 0x9C, 0x83, 0x80, 0x80, 0x69, 0x64, 0x89, 0xA1, 0x9C,
 0x78, 0x68, 0x80, 0x8D, 0x76, 0x86, 0xB1, 0x8E, 0x71, 0x96, 0xA5, 0x9B,
 0x79, 0x66, 0x88, 0x99, 0x7A, 0x67, 0x95, 0x8F, 0x7E, 0x90, 0x72, 0x70,
 0x7F, 0x96, 0x9F, 0x80, 0xA0, 0xA7, 0x67, 0x67, 0x7F, 0x96, 0x9E, 0x78,
 0x74, 0x6C, 0x4C, 0x7C, 0x9C, 0x7E, 0x83, 0x68, 0x7B, 0x9C, 0x6A, 0x75,
 0x97, 0x93, 0x7E, 0x6F, 0x60, 0x54, 0x78, 0x6F, 0x43, 0x4A, 0x70, 0x91,
 0x98, 0x97, 0x93, 0x72, 0x5E, 0x5B, 0x51, 0x6F, 0x91, 0x9B, 0xAE, 0x97,
 0x5B, 0x69, 0xA7, 0xA5, 0x88, 0x7F, 0x81, 0x72, 0x62, 0x7D, 0x9B, 0x9F,
 0x83, 0x6A, 0x78, 0x8C, 0x7D, 0x7C, 0xA6, 0x9D, 0x74, 0x89, 0xA3, 0x9E,
 0x85, 0x68, 0x7D, 0x97, 0x85, 0x68, 0x87, 0x96, 0x7E, 0x8C, 0x7C, 0x6E,
 0x7C, 0x8E, 0xA0, 0x87, 0x91, 0xAC, 0x79, 0x62, 0x7A, 0x8E, 0x9F, 0x84,
 0x72, 0x72, 0x53, 0x69, 0x99, 0x86, 0x81, 0x73, 0x6F, 0x98, 0x7A, 0x6B,
 0x90, 0x96, 0x85, 0x73, 0x66, 0x55, 0x6D, 0x78, 0x4F, 0x46, 0x65, 0x88,
 0x97, 0x96, 0x95, 0x7D, 0x62, 0x5E, 0x54, 0x64, 0x8A, 0x98, 0xA7, 0xA2,
 0x6C, 0x5E, 0x96, 0xAA, 0x8F, 0x80, 0x80, 0x78, 0x65, 0x73, 0x94, 0x9F,
 0x8D, 0x6F, 0x72, 0x87, 0x84, 0x78, 0x99, 0xA4, 0x7D, 0x7F, 0x9D, 0xA0,
 0x8D, 0x6F, 0x74, 0x91, 0x8D, 0x6E, 0x7A, 0x96, 0x84, 0x87, 0x84, 0x6F,
 0x78, 0x87, 0x9C, 0x90, 0x89, 0xA8, 0x8B, 0x64, 0x73, 0x88, 0x9B, 0x8E,
 0x74, 0x74, 0x5E, 0x5D, 0x8E, 0x8E, 0x7F, 0x7A, 0x6C, 0x8E, 0x87, 0x6A,
 0x85, 0x96, 0x8A, 0x77, 0x6B, 0x5A, 0x64, 0x79, 0x5C, 0x46, 0x5C, 0x7F,
 0x94, 0x95, 0x95, 0x86, 0x69, 0x5F, 0x58, 0x5D, 0x80, 0x94, 0xA1, 0xA6,
 0x7D, 0x5D, 0x84, 0xA8, 0x96, 0x83, 0x80, 0x7C, 0x6A, 0x6D, 0x8B, 0x9D,
 0x94, 0x77, 0x6F, 0x81, 0x87, 0x7A, 0x8D, 0xA5, 0x89, 0x7A, 0x95, 0xA0,
 0x94, 0x77, 0x6F, 0x89, 0x91, 0x77, 0x73, 0x91, 0x8A, 0x83, 0x88, 0x74,
 0x74, 0x82, 0x96, 0x96, 0x87, 0x9F, 0x98, 0x6C, 0x6D, 0x82, 0x96, 0x94,
 0x7A, 0x74, 0x67, 0x59, 0x80, 0x92, 0x81, 0x7D, 0x6E, 0x82, 0x8F, 0x6F,
 0x7B, 0x93, 0x8E, 0x7D, 0x6F, 0x60, 0x5F, 0x76, 0x68, 0x4B, 0x54, 0x75,
 0x8F, 0x95, 0x95, 0x8C, 0x71, 0x62, 0x5C, 0x5A, 0x76, 0x90, 0x9C, 0xA6,
 0x8B, 0x62, 0x75, 0xA1, 0x9D, 0x87, 0x80, 0x7E, 0x70, 0x6A, 0x82, 0x99,
 0x98, 0x7F, 0x6F, 0x7B, 0x87, 0x7D, 0x84, 0xA1, 0x92, 0x7C
};

10 ms ?

With Arduino Uno:
Rob Tillaart code: 370 µs.
With #pragma GCC optimize("-O3"): 140 µs.

And changing some 'int' to 'unsigned int': 110 µs.

I'm not sure how to select the best type for the integers. The 'thresh' is only an 'int', should it be a 'long' ?

I tried to be smarter than the compiler and did the multiply with: sum += *p1++ * *p2++ ;
That did not help. The compiler already optimizes the array with the indexes.

I think the compiler already knows that multiplying two signed 8-bit variables should be done with an integer multiply.
Because it stays at 110 µs with: sum += long( int( rawData[k]) * int( rawData[k+i])) ;

With Arduino M0+:
At first I could not get a Arduino M0+ to calculate the frequency. It returned 5512.50. Changing the 'char' to 'int8_t' makes it work. I don't know how that is possible.
Rob Tillaart code: 29 µs
With -O3: 21 µs
And optimizing towards 32-bits variables: 20 µs
From 370 µs to 20 µs, that is nice :wink:

P.S.: I think I picked the wrong sketch ? I just took the latest sketch without reading the explanation very well.

Koepel:
With Arduino M0+:
At first I could not get a Arduino M0+ to calculate the frequency. It returned 5512.50. Changing the 'char' to 'int8_t' makes it work. I don't know how that is possible.

char is unsigned on ARM.

oqibidipo:
char is unsigned on ARM.

Thanks ! I didn't know that.

thanks for all that. I need the delay to go unnoticed, so I'll have to try a faster processor. Though the code from Rob Tillaart runs twice as quick. If I could get that working, and run the LENGTH at 128 that might do.

I finally got the code to run but its got problems when compared with akellyirl's original code.

  1. Although the changes work OK on the supplied data file, when applied to a variable input such as from an A/D , often a frequency exactly half of the actual, is being outputted.

  2. Changing the raw data type from 'byte' to 'signed char', increases the overall processing time for each outputted frequency from 100mS to about 160mS.