Go Down

Topic: Speech/Voice Recognition (Read 4811 times) previous topic - next topic

andrecr03

Hello everyone!

My name is André, and I need some help to set up a project, using Arduino, to recognize sounds, voices and, more specifically, speech. And I need to do this without any shields, or servers! Just a frequency analysis, using FFT for example.
For my class project, I have to do something like this and compare the efficiency with a system using a shield, like EasyVR 3.0.
I've been searching around and some members have already posted something like that, however the files on the links were no longer available.
I have to do this until the end of the semester, so I'm in a hurry! If any of you could help me, or send me examples, algorithms for studying, etc, I would be so much grateful. If you have working sketches would help a lot too, since I could study the code line by line.

Thank you very much in advance! Please help me :(

André

andrecr03

One more thing!

Without the shield or the server, it can be a simple digital signal processing, just to make a LED to light up, for example; I say "green" and a green LED lights up, that would suffice with the use of Arduino stand-alone. Obviously, if a sketch could do something more it would be better, but just to facilitate the job for you guys...

andrecr03

Hello everyone!

My name is André, and I need some help to set up a project, using Arduino, to recognize sounds, voices and, more specifically, speech. And I need to do this without any shields, or servers! Just a frequency analysis, using FFT for example.
For my class project, I have to do something like this and compare the efficiency with a system using a shield, like EasyVR 3.0.
I've been searching around and some members have already posted something like that, however the files on the links were no longer available.
I have to do this until the end of the semester, so I'm in a hurry! If any of you could help me, or send me examples, algorithms for studying, etc, I would be so much grateful. If you have working sketches would help a lot too, since I could study the code line by line.

One more thing!

Without the shield or the server, it can be a simple digital signal processing, just to make a LED to light up, for example; I say "green" and a green LED lights up, that would suffice with the use of Arduino stand-alone. Obviously, if a sketch could do something more it would be better, but just to facilitate the job for you guys...

Thank you very much in advance! Please help me :(

André



andrecr03

Oh, Sorry, I have to do something without the use of servers and shields. Only with the Arduino. I would like to do some FFT and spectrum analysis.
Do you have some example codes with FFT or something like that?

jremington

The FFT won't help you with voice recognition, but if you want to learn something about it, check out http://wiki.openmusiclabs.com/wiki/ArduinoFFT.

Arduino is totally unsuited for voice recognition.

andrecr03

#6
Apr 05, 2017, 07:16 pm Last Edit: Apr 05, 2017, 07:33 pm by andrecr03
I have already seen something like that around here, but, as I said, the files were no longer available on the links... I just wanted to do a simple recognition system, to perform an analysis on frequency peaks of a voice signal.

andrecr03

More specifically on this link: http://forum.arduino.cc/index.php?topic=352777.0

Coding Badly


@andrecr03, do not cross-post.  Threads merged.


pjrc

You haven't even said *which* Arduino you're using.  There are many different Arduino boards, and many more that are not officially Arduino but are Arduino compatible.

This is important, because the many boards vary greatly in capability.  Arduino Uno can just barely manage even a small FFT, and probably can't do any significant data processing without considerable blind (or "deaf") time between each set of data collected.  Boards like Arduino Due are much more powerful, but still there can be tricky matters of software if you wish to do analysis while still collecting the next imcoming data so you don't have gaps between each FFT.

Grumpy_Mike

Quote
I just wanted to do a simple recognition system,
Sorry there is no such thing. They are all complex.

Quote
I say "green" and a green LED lights up,
and what if you say "great", "greedy", "margarine" and the green light comes on?
Basically an FFT is only the very first step, you need to take lots of FFTs for the duration of the sound. Then you have to do a search of all the template sounds you have stored and see which one matches most closely. Then you have to look at the probability that the close match you have is actual a word you want.
The whole thing works on probabilities. There is no such thing as a perfect speech recognition system.

andrecr03

Sorry, my mistake. I'm using an Arduino Due.
I know there isn't a simple speech recognition, but what I mean is that I want to make a system that can analyze spectrums and choose actions..

pjrc

Of course it won't be perfect Mike.  But Apple, Amazon and others have figured out how to do this pretty well, at least for English language with USA dialect.  Admittedly they are using huge server farms, so perhaps the methods are impractical for microcontrollers?  Or maybe not?

I'm particularly interested in this for my Teensy Audio Library.... since we already have continuous 50% overlapped windowed 1024 point FFTs running on Teensy with quite a lot of the CPU time still available to actually do that pattern matching (especially on the newer, faster Teensy 3.6).  In the coming years, we're going to get more and more powerful chips, since today's fastest microcontrollers are still mostly only 90 nm silicon.

Are there good public references for how the FFTs are distilled to smaller data sets, and those then matched to patterns?  Or is that sort of knowledge only existing as the "secret sauce" at Apple, Google & Amazon?

Grumpy_Mike

There are lots of template matching algorithms in the public domain, they are mainly based on correlation. Look at the gesture controlled stuff for a simple example.

Quote
Apple, Amazon and others have figured out how to do this pretty well, at least for English language with USA dialect.
That is the problem I am actually English with a north Manchester accent, it is not at all strong but voice recognition is annoying imprecise. Anyone with a strong accent doesn't stand a chance. We don't all sound like Dick van Dike in the movies, in fact none of us do.

andrecr03

Please, I just have to do a word recognition algorithm. It doesn't have to be speaker independent. It can be only for my voice!
I just want to know how can I take a voice signal from a microphone connected to the board (a KY-037 microphone) and do some signal processing on it.
To sum up, are there some example algorithms I can use, modifying the code?
I would like to know and use libraries and functions, like to do a FFT, show a spectrum, start listening the microphone, etc.
I don't know any of that

Go Up