Identifying a human voice

Hi guys! :slight_smile:

So one part of our project includes an activation signal only when our microphone detects a human voice (could be any speeches like "Run" "Open" "Banana!") and a human is actually standing in a certain range.

I wonder if it is seizable through Arduino? If so, how is the approach? My primary idea is to use a ultrasonic sensor to detect if there is something in the range. And then we use Arduino to measure the sound's frequency and loudness and try to differ human voice from all other noises using the data.

But I am now wondering is it accurate since there could be actually many sounds within the human voice frequency range? I am completely new in Adruino so please help :o

Moderator added returns for readability....

It might be possible but almost certainly not with an Arduino.
Try Raspberry Pi or similarly powerful platform.

Arduino voice recognition modules

jremington:
Arduino voice recognition modules

But that is not doing it with an Arduino, it is doing it on some other board attached to an Arduino. The Arduino has little to offer the yask except looking at the output of the computer that does it.

Also it doesn't fulfil OPs request, which is "recognise that there's a human voice" rather than "listen for specific commands". The second is probably much easier (maybe better to say less hard).

wvmarle:
Also it doesn't fulfil OPs request, which is "recognise that there's a human voice" rather than "listen for specific commands". The second is probably much easier (maybe better to say less hard).

It is interesting for me know that a "specific command" is actually much easier. Could you tell me more about how to do it with Arduino? Can it be solved using FFT?

It is interesting for me know that a "specific command" is actually much easier.

This is because it uses template matching. That is it takes a recording of the word you want to trigger from and compare it with what you get. Then you get a score as to how close it is and you decide when an arbitrary set threshold has been exceeded. It is not very good on false positive results.

The normal method is on correlation of the time domain recording and template but you can use the FFT if you like.

Can it be solved using FFT?

Not on an Arduino.

It is really hard to distinguish "human voice" from "something else" when listening to sound.

Do think about it: what makes "human voice" sound like "human voice"? Is it the language part you recognise (English or whatever you happen to speak)? But what if they speak Swahili, or Hindi, or Chinese? As a human you can probably recognise "that's a human voice" but what is it based on, really? Frequencies? Patterns? If so - which?

Voice commands are far more defined, making it easier for a computer to recognise. Do mind, the computer just hears a command. So it can hear "lights on" or "ljósin á" (the same in Icelandic) and if programmed to recognise those patterns knows what to do.

If you want to see what information you can get from an FFT then download the application “Audacity “ for your laptop, it is free and allows you to record / reply sounds, it also has an FFT, so it is simple to see what an FFT will give you without all the complexity of trying to use an Arduino.

If it turns out it does the job then you can worry about getting an Arduino to do the FFT and display the results. You would probably need an Arm based Arduino.

A highly qualified ( Cambridge maths PhD) friend of mine spent about 6 months trying to do speech recognition using a high-end PC ( very fast processor, effectively unlimited memory) as the engine.

He achieved only partial success, and concluded it was difficult.

I worked for a company in the USA trying to do the same ( not my part of the project) and they came to the same conclusion.

Even with ARM performance it wouldn't be a doddle.

Probably only reliable with high quality audio and a limited dictionary of words to recognize.
High background noise makes things much harder.

As do strong regional accents .

Allan