Arduino pitch detection in realtime (using autocorrelation and peak detection)

I don't think I'm printing as I'm sampling, since the array is filled with data first. What are your main reasons for having so little hope for the project? The main problems are the harmonics that are larger then the fundamental and the fact that the results are off by a constant error. With my limited knowledge I figured that by checking multiple peaks the first can be fixed, and that a correction formula will take care of the error.

What are your main reasons for having so little hope for the project?

First off you have not told us what you want to do with this frequency measuring thing, so it might be an X-Y Problem

Then I have seen many people try this sort of thing in the past and it ether does not work, or they redefine what "work" means so that it does.

As you have seen it is not as simple as you first thought, that complexity increases as you get further into it. When you finally get it to "work" you will inevitably be faced with false negitave and false positive results. So how critical is this and what percentage can you tolerate.

As you are a student the main point of this exercise is to find out what is possible and what is not so don't let me stop you, but their is always a gap between theory and practice.

This is because the theory is often idealised and simplified and omits important things like noise, volume level and resolution of measurements and inconsistency of input signals.

With my limited knowledge I figured that by checking multiple peaks the first can be fixed, and that a correction formula will take care of the error.

OK go ahead and try this, as I don't know you criteria for "success" then I can't definitively say what would change the situation sufficiently. But given the fact that when you started it you didn't know about DC offsets on the input signal, dosn't fill me with conference.

Hey but I am just a grumpy old fart, go and prove me wrong.

barthulsen:
Hi,

I think the solution is to check multiple peaks in the autocorrelation function, and pick out the one with the lowest frequency as the fundamental.

Autocorrelation doesn't produce "frequency", it is a measure of how much the signal looks like a time shifted version of itself. The presumption behind this algorithm is that the signal is periodic, that is the waveform more or less repeats itself at regular intervals, and by detecting the period at which it repeats, one can compute the fundamental frequency.

The figure below shows the unbiased autocorrelation (see post #4 in this thread for unbiased calculation). The current peak detection algorithm finds the "1st correlation peak" and computes the frequency as the sample frequency divided by the lag at that peak. The frequency resolution due to discrete lag values goes as approximately f2/fs where "f" is the frequency being measured and "fs is the sample rate. The data pictured is from the signal.h file, so nominally f = 260 Hz, fs = 8919 Hz, thus about 8 Hz resolution.

By going to the second correlation peak rather than the first this error component is reduced, generally going as approximately f2/fs/n where "n" is the nth correlation peak. To implement this, one might use the current peak finding algorithm to find the first peak and then, starting from that lag point, repeat the algorithm to find the second peak, and so forth. It would be best to quit this iterative process before getting too close to the end of the autocorrelation sequence because it gets noisy due to the limited amount of data overlap.


Edit to add: Note that the signal in this case is apparently middle C from an electronic synthesizer. I expect correlation products from non-electronic instruments may not be so nicely defined.

barthulsen:
What do you think about my idea to use a frequency sweep to determine a correction formula? I'm noticing that the code is printing very constant outputs, but they're off with an error that increases with the frequency.

I'm not entirely clear what you mean by frequency sweep in this context. I think the best way to approach getting a consistent and known sample rate is to configure the A/D converter for continuous sampling so that it has a deterministic sample rate derived from the oscillator. Examples of doing this can be found at Example - Open Music Labs Wiki (using polling) and http://www.instructables.com/id/Arduino-Audio-Input/ (using interrupts, see "step 7").

Grumpy_Mike thank you for the explanation. I see now that I have not been clear with you and appreciate, given this fact, that you are willing to give me advice. As for the goal of the project, I'm trying build a guitar tuner. I've indeed found a number of examples (which often do not work) on the internet. What I want to make clear is that this is not a schoolproject. I'm not trying to get you to make my 'homework'. I started the project because I like coding. I'm not very good at it, that is true, but I am motivated. Back to the topic, I've found this project that implements interpolation, as mentioned before by MrMark:

MrMark, what I mean with a frequency sweep is to put a lot of different, known frequencies on the input and determine a formula that could be used as a correction. However, you made clear to me that the resolution becomes a problem. Is that why I'm getting very constant results? I'm going to take a look at continuous sampling, try to convert the peak state machine and see how interpolation is implemented in the link above. (There is a link to GitHub on the website for the code of the project).

As to the project succeeding or not, my goal is to learn more about this topic and have some fun while I'm at it.

barthulsen:
MrMark, what I mean with a frequency sweep is to put a lot of different, known frequencies on the input and determine a formula that could be used as a correction. However, you made clear to me that the resolution becomes a problem. Is that why I'm getting very constant results?

The algorithm without interpolation will only return certain discrete values. The possible values are the sample rate divided by an integer value because the lag value must be integer.

It's reassuring that the derivation I did for quadratic interpolation matches the code at your link. :slight_smile:

If I understand correctly, that code takes the measured frequency and maps it to specific notes which seems appropriate for music transcription, but wouldn't make sense for a tuning aid. It appears to be a nicely done project.

I'm going to take a look at continuous sampling, try to convert the peak state machine and see how interpolation is implemented in the link above. (There is a link to GitHub on the website for the code of the project).

It turns out to be pretty simple. The formula is 0.5 * (right - left) / (2 * mid - left - right) where "mid" is the correlation amplitude at the detected peak, "right" and "left" are the correlation amplitude of the correlation amplitude at the lags immediately following and preceding the detected peak respectively. This gives a fractional offset that is added to the lag at the detected peak.

So with what I’ve learned from your help, I’ve written down some things that i can try.

Use interrupts to get a higher sample rate and therefore a better resolution. I need to keep in mind that the number of samples needs to be twice the period of the lowest frequency. Next to that, taking a lot of samples is heavy on memory, so I need to find the best combination of sample rate and the number of samples on this front.

Convert the peak detection state machine to check multiple peaks for, (1) the first peak might translate to a harmonic and not the fundamental, and (2) as I’ve learned from MrMark, correlation peaks that go beyond the first one produce less error in frequency. Knowing this, maybe its not a bad idea to work with these harmonics. If the string that is measured is known the harmonics can be translated to its fundamental, which will then have a better accuracy due to the fact that the harmonics are usually beyond the first correlation peak. A disadvantage of this idea is that the frequency range should be larger when implementing this.

Lastly use the quadratic interpolation.

I think I’ve already learned quite a bit! I hope you can follow my ideas and i appreciate your comments on them. Due to the fact that I’m Dutch the English language can sometimes be a bit of a struggle.

As for the goal of the project, I'm trying build a guitar tuner.

Most commercial tuners don't actually bother to output the octave, just the note name. The tuner I have for my harp does this.
Not that I play it, it was just one of my projects that played one, it is on the front cover of my book Arduino Audio projects read a bit more about it here Harp player video

This is a lot easier to do than to produce the exact frequency and for tuning it doesn't matter which harmonic you pick up on as it just the note name you want, and an indication if it is sharp or flat.

In the link below the algorithm is used as well. He did not improve the resolution, so i doubt that it works well for the higher frequencies... weird.

Grumpy_Mike i'm impressed by your projects! To check if the tuning is right, i think i do need the frequency as accurate as possible.

To check if the tuning is right, i think i do need the frequency as accurate as possible.

And you can get that without needing to indicate the octave.

Alright thanks, now I understand. I'm also taking a look at how a circular buffer works, like Grumpy_Mike posted.

Basically an array, with an input pointer and an output pointer given by variables. Two ways you can arrange this but nothing much to choose between them. Have the input pointer pointing to the next free buffer space and the output pointer to the next output value. When ever you put something in or take something out you increment the pointers and make them wrap round, that is start again at zero when it reaches the size of the array.

If on incrementing the input pointer it exceeds the output pointer then your buffer has overflowed.

Thanks! I was thinking about the following. The sample rate determines the resolution, and it is limited by the amount of samples that are taken due to the fact that I do not want to use a lot of memory and I need at least two times the largest period for the autocorrelation to work. What if I use interrupts to get the maximum sample rate for the highest frequency, and bring it down for the other strings / frequencies by ignoring some of the samples. If I'm not mistaken, by doing this I can get a fairly good and above all constant resolution. Combined with interpolation it might be accurate enough for the project to work. What do you think about this idea?

Well that is known as sub sampling, you are best doing this by altering the timing on the interrupt. However the problem is that you must not fall below the Nyquist rate otherwise the aliasing will give you false results. You get this problem if you sample slower or simply ignore samples.

Hi,

I’ve been using interrupts to use the maximum sample frequency for every string. I’ve combined it with continuous sampling on A0. The OCR value is calculated with the following formula:

OCR1A = (time between interrupts / 0.0000000625) – 1

The time between the interrupts is calculated with the maximum period, which I expect to be at a frequency that is about 10 Hz (this is the correction) lower than the fundamental. This period is multiplied with 2, to get the desired window for the autocorrelation, and then divided by the number of samples. See below:

OCR value = 2 * (1 / (string – correction)) / arraySize / 0.0000000625 – 1

The sample frequency is calculated with this OCR value, to eliminate rounding errors.
While testing I noticed that the resolution is better when I change the 2 to 3 or 4. So instead of using 2 periods, I’m using 3 or 4 and therefore lowering the sample frequency. Actually, it’s also quite unstable when using the formula above. The results with the increased number of periods are reasonable. However, I’d like to know what is wrong with my calculations and how that I can improve them. I’ve attached the code.

Thanks in advance,
Bart

autocorrelation_onA0.ino (2.26 KB)

If I understand correctly, you're getting better results with a sample rate such that you get 3 or 4 full cycles of the fundamental frequency in your 500 sample buffer rather than just 2 full cycles.

One theory for why this might happen is that your audio source contains some non-harmonic content which is better rejected by the longer correlation interval, that is, the more cycles used, the more non-harmonic signal content decorrelates at the lag where harmonic content has peak correlation, thus distorts the correlation peak less. What are you using for audio in your testing?

Looking at your attached code, it looks like the sample rate chosen for B2 is around 60 kHz which seems unnecessarily high in any case.

Hi, you're right about the increased amount of cycles and their consequence. I'm using a MyDAQ for the audio during tests. As for the high sample rate, the B2 frequency is 246.94 Hz. The formula below yields a resolution of 1 Hz (when using 60 kHz as the sample frequency).

Resolution = f^2 / fs

Why is the sample frequency of 60 kHz unnecessary high? With my knowledge I figured that this is the way to get results that are accurate enough for tuning. The weird thing is, however, that I found two projects that use this technique but do not seem to improve the resolution (which would be terrible for the high strings). I've attached the links below. As for my code, do you have any tips for me on the overall coding style and structure?
Thanks

http://www.ericthuang.com/actuating-guitar-tuner.html
https://www.google.nl/amp/www.instructables.com/id/Guitar-Tuner-With-an-Arduino/%3Famp_page=true

f^2 / fs describes the error range due to the discrete lags in correlation. It isn't the complete error model as there will be contributions from signal to noise (and interference such as non-harmonic content theorized as the issue above). I think a better approach to mitigating the discrete lag issue is interpolation as discussed earlier in the threat.

Aside from the discrete lag issue, the sample frequency needs only to be twice the highest frequency in the signal being digitized which can be controlled by a low pass filter prior to the ADC. Even high fidelity audio is typically only about 40 kHz sample rate. Autocorrelation works essentially by measuring the parts of the signal that match up cycle to cycle while rejecting parts of the signal that don't (random noise and non-harmonic cyclic content). The amount of rejection of the latter is proportional to the integration period so for this you want a longer sample. Given the limited memory for signal capture, this means there are conflicting requirements in that discrete lag error wants high fs and noise rejection wants a longer signal capture time. To optimize performance the best compromise between these needs to be determined.

With regard to your coding style, I like that you clearly broke the data capture and frequency determination into their own functions. This makes "loop" pretty obvious. In the readData function using "string" as a variable name is not descriptive of what it means and the very similar "String()" is a keyword. Both of these are bad practice. In a couple places you use "0.0000000625" which is the reciprocal of 16e6, the MCU master clock frequency. Using the latter would be more clear.

MrMark, thank you for the very clear explanation and the feedback on my coding. My next step will be to implement interpolation and use lower sample frequencies. Is there a structural way to determine the best relation between the sample frequency and signal capture time, besides trial and error?

So I've implemented interpolation and it works well! For now, I'm keeping the OCR calculation. My main reason for this is that I can easily play around with the signal capture time and try to find the best relation with the sample frequency. The results are quite good, with resolutions that go below 1 Hz. However, there are some weird errors. When testing with the MyDAQ at a constant frequency, the Serial Monitor would plot, for example, 10 good results that are exactly the same and then plots 3 or 4 results (that are very different from each other) with a large error. This goes on at a constant rate. What could be the problem? To get around this issue I might run the calculation a few times and choose the frequeny that is calculated multiple times as the result, or use the average. See the interpolation code below:

    else if (pdState == 2 && (currentSum - previousSum) <= 0) {
      // quadratic interpolation
      for (int k = 0; k < arraySize - i + 1; k++) nextSum += (rawData[k] - mean) * (rawData[k + i + 1] - mean);
      float interpolationValue = 0.5 * (nextSum - previousSum) / (2 * currentSum - previousSum - nextSum);
      period = i + interpolationValue;
      pdState = 3;
    }

Bart

I think your code now looks something like the following. I made the buffer 1000 samples which seems to help reduce the noise (as it should), but slows down the autocorrelation rate (also expected). Using a function generator on my cell phone and going between 320 Hz and 350 Hz sine wave, I got the plot below. The microphone had to be very close to the phone's speaker. There's some noise in the measurement and a few outliers, but generally it seems to do pretty well.

// Arduino Pitch Detection on A0 with autocorrelation and peak detection
// Original author(s): akellyirl, revised by robtillaart and MrMark

#define E1 329.63
#define B2 246.94
#define G3 196.00
#define D4 146.83
#define A5 110.00
#define E6 82.41

float sampleFrequency;
float frequencyRange = 10;
volatile int interruptCount = 0;
const int arraySize = 1000;
volatile byte rawData[arraySize];
long currentSum, previousSum, nextSum ;
int threshold = 0;
float frequency = 0;
byte pdState = 0;

void setup() {
  noInterrupts();
  TCCR1A = 0;
  TCCR1B = 0;
  TIMSK1 = 0;
  TCCR1B |= (1 << WGM12);
  TCCR1B |= (1 << CS10);
  TIMSK1 |= (1 << OCIE1A);

  ADCSRA = 0;
  ADCSRB = 0;
  ADMUX |= (1 << REFS0);
  ADMUX |= (1 << ADLAR);
  ADCSRA |= (1 << ADPS1);
  ADCSRA |= (1 << ADATE);
  ADCSRA |= (1 << ADEN);
  ADCSRA |= (1 << ADSC);

  Serial.begin(115200);
  pinMode(LED_BUILTIN, OUTPUT);
}

ISR(TIMER1_COMPA_vect) {
  // Collect analog signal
  rawData[interruptCount] = ADCH;
  interruptCount ++;
}

void readData(float baseFreq) {
  int OCRvalue = 2 * (16000000 / (baseFreq - frequencyRange)) / arraySize - 1;
  OCR1A = OCRvalue;
  sampleFrequency = 16000000 / (OCRvalue + 1) ;
  interrupts();
  while (interruptCount != arraySize); // wait for data
  noInterrupts();
  interruptCount = 0;
}

void findFrequency() {
  // Calculate mean to remove DC offset
  long meanSum = 0;
  for (int j = 0; j < arraySize; j++) {
    meanSum += rawData[j];
  }
  char mean = meanSum / arraySize;
  // Autocorrelation
  currentSum = 0;
  pdState = 0;
  int period = 0;
  for (int i = 0; i < arraySize && (pdState != 3); i++) {
    // Autocorrelation
    previousSum = currentSum;
    currentSum = 0;
    for (int k = 0; k < arraySize - i; k++) currentSum += (rawData[k] - mean) * (rawData[k + i] - mean);
    // Peak detection
    if (pdState == 0) {
      threshold = currentSum / 2;
      pdState = 1;
    }
    else if (pdState == 1 && (currentSum > threshold) && (currentSum - previousSum) > 0) {
      pdState = 2;
    }
    else if (pdState == 2 && (currentSum - previousSum) <= 0) {
      // quadratic interpolation
      nextSum = 0 ;
      for (int k = 0; k < arraySize - i + 1; k++) nextSum += (rawData[k] - mean) * (rawData[k + i + 1] - mean);
      float interpolationValue = 0.5 * (nextSum - previousSum) / (2 * currentSum - previousSum - nextSum);
      period = i + interpolationValue;
      pdState = 3;
    }
  }

  // Frequency identified in Hz
  if (threshold > 100) {
    frequency = sampleFrequency / period;
    if (frequency < 400) {
      Serial.println(frequency);
    }
  }
}

void loop() {
  digitalWrite(LED_BUILTIN, HIGH);
  readData(D4);
  digitalWrite(LED_BUILTIN, LOW);
  findFrequency();
}