Identifying a human voice

DefNotTaiga · March 17, 2018, 11:46pm

Hi guys!

So one part of our project includes an activation signal only when our microphone detects a human voice (could be any speeches like "Run" "Open" "Banana!") and a human is actually standing in a certain range.

I wonder if it is seizable through Arduino? If so, how is the approach? My primary idea is to use a ultrasonic sensor to detect if there is something in the range. And then we use Arduino to measure the sound's frequency and loudness and try to differ human voice from all other noises using the data.

But I am now wondering is it accurate since there could be actually many sounds within the human voice frequency range? I am completely new in Adruino so please help :o

Moderator added returns for readability....

wvmarle · March 18, 2018, 11:00am

It might be possible but almost certainly not with an Arduino.
Try Raspberry Pi or similarly powerful platform.

jremington · March 18, 2018, 3:44pm

Arduino voice recognition modules

Grumpy_Mike · March 18, 2018, 3:56pm

jremington:
Arduino voice recognition modules

But that is not doing it with an Arduino, it is doing it on some other board attached to an Arduino. The Arduino has little to offer the yask except looking at the output of the computer that does it.

wvmarle · March 18, 2018, 4:19pm

Also it doesn't fulfil OPs request, which is "recognise that there's a human voice" rather than "listen for specific commands". The second is probably much easier (maybe better to say less hard).

DefNotTaiga · March 18, 2018, 8:16pm

wvmarle:
Also it doesn't fulfil OPs request, which is "recognise that there's a human voice" rather than "listen for specific commands". The second is probably much easier (maybe better to say less hard).

It is interesting for me know that a "specific command" is actually much easier. Could you tell me more about how to do it with Arduino? Can it be solved using FFT?

Grumpy_Mike · March 18, 2018, 10:07pm

It is interesting for me know that a "specific command" is actually much easier.

This is because it uses template matching. That is it takes a recording of the word you want to trigger from and compare it with what you get. Then you get a score as to how close it is and you decide when an arbitrary set threshold has been exceeded. It is not very good on false positive results.

The normal method is on correlation of the time domain recording and template but you can use the FFT if you like.

Can it be solved using FFT?

Not on an Arduino.

wvmarle · March 19, 2018, 3:26am

It is really hard to distinguish "human voice" from "something else" when listening to sound.

Do think about it: what makes "human voice" sound like "human voice"? Is it the language part you recognise (English or whatever you happen to speak)? But what if they speak Swahili, or Hindi, or Chinese? As a human you can probably recognise "that's a human voice" but what is it based on, really? Frequencies? Patterns? If so - which?

Voice commands are far more defined, making it easier for a computer to recognise. Do mind, the computer just hears a command. So it can hear "lights on" or "ljósin á" (the same in Icelandic) and if programmed to recognise those patterns knows what to do.

Grumpy_Mike · March 19, 2018, 5:16am

If you want to see what information you can get from an FFT then download the application “Audacity “ for your laptop, it is free and allows you to record / reply sounds, it also has an FFT, so it is simple to see what an FFT will give you without all the complexity of trying to use an Arduino.

If it turns out it does the job then you can worry about getting an Arduino to do the FFT and display the results. You would probably need an Arm based Arduino.

allanhurst · March 19, 2018, 5:45am

A highly qualified ( Cambridge maths PhD) friend of mine spent about 6 months trying to do speech recognition using a high-end PC ( very fast processor, effectively unlimited memory) as the engine.

He achieved only partial success, and concluded it was difficult.

I worked for a company in the USA trying to do the same ( not my part of the project) and they came to the same conclusion.

Even with ARM performance it wouldn't be a doddle.

Probably only reliable with high quality audio and a limited dictionary of words to recognize.
High background noise makes things much harder.

As do strong regional accents .

Allan

Topic		Replies	Views
Detect Human Voice with a MAX9812 microphone Sensors	8	2359	May 6, 2021
Voice recognition of Different Devices Project Guidance	2	283	May 5, 2021
Sound Recognition Project Guidance	13	1415	May 5, 2021
FFT or Autocorrelation Audio	11	8161	May 6, 2021
Voice Recognition Techniques Project Guidance	10	3225	May 5, 2021

Identifying a human voice

Related Topics