How to determine fundamental frequency of cry and speech signals.

I am having a project using Arduino UNO to detect and recognize baby cry signal. I already made a sensitive microphone to detect the baby cry. The main purpose of this project is to differentiate the baby cry and human speech based on the fundamental frequency. For baby cry, the fundamental frequency varies from 300 Hz to 600 Hz after doing some literature review.

However, I have searched everywhere on the Internet and I found a website using FHT method to determine the fundamental frequency. It works perfectly when a single tone signal (eg:500 Hz) is used with a function generator. But during the trials, several baby cry sample signal were emitted to the microphone respectively and the results vary from 400 Hz to 1000 Hz (sometimes up to 1500 which is too high).

For your information, the FHT size is 256 and sampling rate is 8000 Hz. Hence, the frequency resolution is 31-32 Hz.

Is there any other possible way to determine the fundamental frequency of speech signals with an acceptable tolerance? If FFT is suggested, where can I get the library and coding example? Have anyone tried with human speech signal processing using Arduino before? Your help is very much appreciated.

The Open Music Labs FFT works fine.

Aliasing is a serious problem that most novices fail to (or don't want to) understand and deal with properly. Your microphone and audio amplifier must be of good quality and you must have an effective low pass filter that prevents frequencies higher than (sample frequency)/2 from getting to the ADC. Otherwise, you are just wasting your time. If you don't know about aliasing, one place to start reading is here:

You might find that autocorrelation works better for this application. You might be ever happier with its cousin, the YIN algorithm. Both techniques estimate the period of the signal directly, and the frequency can be calculated from that value. The YIN algorithm is described in scholarly fashion here -, and used in a project to identify the frequency of a guitar string here - You can find a discussion of autocorrelation here -, along with some demonstration code, and some comments on the code.

The digital Fourier transform (DFT) does a good job of computing spectral peaks, but it's less able to reliably identify the fundamental frequency of a signal. It's not unusual for the peak that corresponds to the fundamental to be buried in spectral leakage, for a complex signal. Autocorrelation and its derivatives might be a better choice for this application.

This again. Is it an assignment? It is about the third time in as many years this has come up.

Basically a baby cry does not have a fundamental frequency. It is a mix of evolving harmonics that has to be tracked in time as well as frequency. The arduino (Uno) is distinctly lacking in memory and processing power to do this.

Grumpy_Mike: Basically a baby cry does not have a fundamental frequency. It is a mix of evolving harmonics that has to be tracked in time as well as frequency.

I'll go along with that. Intuitively, I think that reliably identifying a baby's cry, and distinguishing it from, say, a baby chuckling or cooing, would take more than simply estimating the signal frequency at a moment in time. It might be possible to process signal magnitude and make a reasonable guess, but it seems that would require calibration on a per-baby, per-placement basis.

Nevertheless, I'd like to know more about how the original poster searched for the fundamental frequency among the bins of the FFT. We consistently hear that an implementation of the FFT misidentifies the fundamental frequency, but I've never seen code, or even a description of the technique, to accompany those complaints.

So, asking the original poster: After you calculated the FFT, how did you decide which bin contained the fundamental?

tmd3: After you calculated the FFT, how did you decide which bin contained the fundamental?

Never mind. I took a look at the OP's referenced code. Here's how it does it:

 binNum = findMax(fht_lin_out, FHT_N/2);
  freq = binNum * (fs / FHT_N);
  return freq;

where findMax() searches the bins for the largest magnitude and returns the bin number, and fs is measured by calling micros() before and after data acquisition - an odd way to do it, since counting ADC clock cycles would certainly be more accurate and more consistent. Obviously, this method won't yield the fundamental frequency. Instead, it will give the frequency with the calculated maximum magnitude, which will be the fundamental only for fairly simple and well-behaved signals.

It's common for complex signals to have harmonics with magnitudes far higher than the fundamental's magnitude. Picking the bin with the largest value isn't the same as finding the fundamental.

A way to do it might be to look for the lowest bin with significant signal content. "Significant" will probably depend on the RMS value of the signal, and a halfway-reliable implementation will probably require a lot of trial and error to identify what relative level is significant.

A more complicated way might be to find the peak value, presume that its frequency is an integral multiple of the fundamental, and look for the fundamental only at frequencies that meet that criterion.

Note that I've never tried either of these techniques - this is pure conjecture. But, locating the peak in the DFT and reporting the result as the fundamental simply won't yield accurate answers for any but the simplest signals.