Go Down

Topic: Speech/Voice Recognition (Read 8308 times) previous topic - next topic


Can you explain what is wrong with the link in reply #3?


See link in reply #5 for FFT code.


Can you explain what is wrong with the link in reply #3?
Please, correct me if I'm wrong. He suggests me to use a server (bitvoicer), but I thought that when you use a server, you don't really develop the voice recognition system/algorithm, and just train words and the programmer only decides how to use each recorded command, like the EasyVR.
For this project, to be graded, I have to do a voice recognition algorithm with Arduino, even if it's simple, but I really have to do the windowing, the FFT, etc, and the block diagram, along with the code, in order to perform the recognition.
Please, correct me if it's needed. As far as I know, the server only provides you processing power, and not DSP techniques and libraries, as I requested.

Thanks for all the help until now.


All simple voice recognition systems require training. The complex ones do as well, it is just that the training is different.
I think you have an impossible project. There are no easy answers, no magic algorithm.

Basically you have to input a sample of sound. Top and tail it, that is remove the start and end bits leaving just the word. Then extract parameters from the sound. I don't think that an FFT is a suitable prameter but it depends on what sort of system you want, an unspecific voice would not depend on any way on the frequency profile.
Finally you need to compare what you have with a previously prepared template to see what probability you have for a match.

The parameter extraction part defines what sort of system you have. As I understand it, the more complex system attempt to extract fricative sections from the speech, and then look up the combinations of these in a dictionary.


I would like to know and use libraries and functions, like to do a FFT, show a spectrum, start listening the microphone, etc.
About a year ago I wrote a very detailed tutorial and taught a couple hands-on workshops which included this exact subject.  But I have some good news and some bad news for you.

First, the bad news: this tutorial and the library code only works on Teensy 3.x.  It will not work on Arduino Due.  The code makes heavy use of DMA and special peripherals not present on Due.  It also uses the Cortex-M4 DSP extensions, which are not present in the Cortex-M3 processor on Arduino Due.  Maybe this material can offer some very minor help even if you don't have compatible hardware, but to actually use it, you will need get a Teensy board.  (full disclosure: my company makes these boards... so my opinion is biased, but I really do wish to help you if I can)

I also do not know how to use the FFT output for speech recognition.  I'm writing this message for you in hopes you might share whatever you learn.  I can only help with the part you asked above... how to get the audio data and perform FFTs, and do so in a way that other code can easily work with the data while more audio is captured and analyzed by the FFT without gaps.

The good news: I wrote a very detailed 31 page tutorial manual.  You can get it here:


Alysia and I also went to the trouble to shoot & edit a complete 48 minute walkthrough video.  It shows all the tutorial steps.  Scroll down at that page for the video.  Of course, actually doing the tutorial yourself with the real hardware is essential, but if you get stuck and the words & pictures of the 31 page manual aren't clear, hopefully the video helps.

Again, this code only works on Teensy 3.2, Teensy 3.5 and Teensy 3.6.  The tutorials use this audio shield.  There are other ways to get signals in and out, and after you've read or watched the tutorial and if you look around the options offered in the design tool (part 2 of the tutorial) you'll see other ways to input & output signals.  But as a practical matter, for a microphone that shield is probably the best way to get started.  Notice the part of the tutorial where the mic gain is software adjustable....

If you read or watch part 3-2 of the tutorial, hopefully you can see how this works for FFT data.  The audio library takes care of acquiring the signal and computing the FFT, which you can then use in your program.

Again, how you'll use the FFT data to recognize works is beyond my knowledge.  I am curious.  If this info helps you get started, and if you learn anything useful that could help me or others, I sincerely hope you'll share.  Maybe you'll even participate here on the forum and occasionally help them, as we're trying to help you?


Thank you very much, Paul and Mike!
Actually, I have already done an algorithm like that on MATLAB, but I cannot use it on this project, not even along with Arduino. It was based on FFT, the user says something, and after some FFT were done on the signal, the peaks of frequency were isolated, then compared with a pattern that MATLAB already knew.

However, since I'm not able to do it using MATLAB, I'm looking for tutorials, examples, etc, that could guide me through this project. How can I say to the board how to start listening to the microphone? Is there a function to do that? How can I store the sound data to be compared to something later? How can I do FFT? Again, what is the correct function to do that? Is there a library I have to install to do what I'm planning to do?
Did you get what I mean? A tutorial for things like that.

I really need help with those topics. In fact, I should have done those questions in first place.. :/


Reply #5 had a link to the FFT library.
It has examples of how to use it.

To start listening you start recording the samples from the analogue input.
You record for the number of samples you want to pass to the FFT.
Then you point the data to the FFT function and do it. The result is that your sample data buffer now contains the FFT of your samples. Typical is 1028 samples so it is not very long.


To start listening you start recording the samples from the analogue input.
You record for the number of samples you want to pass to the FFT.
How can I do that?
There is a function to record data in the library on reply #5?


Apr 08, 2017, 05:53 am Last Edit: Apr 08, 2017, 05:54 am by jremington
There is a function to record data in the library on reply #5?
Is it too much trouble to look at that link by yourself?

If so, let us know and forum members will be glad to do so, and summarize the findings for you.


Apr 08, 2017, 06:41 am Last Edit: Apr 08, 2017, 06:42 am by Grumpy_Mike
There is a function to record data in the library on reply #5?
Yes. It is in the example code.


How can I say to the board how to start listening to the microphone? Is there a function to do that?
Seems you did not read the tutorial PDF I wrote?  This is explained in "Part 2-4: Using the Microphone" on page 16.

Ok, maybe reading page 16 is too much work?  It's also in the video, starting at 15 minutes and 54 seconds.  Here is a direct link that is supposed to start playing the video at 15:54.


Can you at least just click this link and watch the video for a couple minutes?  Is that too hard?

How can I do FFT?
Again, also in the tutorial, part 3.2.

Look, if you weren't even able to read page 16 or watch that part of the video, how do you hope to read pages 24 to 29.  That's six times as much than reading only page 16.

Seriously, there comes a time when everyone needs to stand on their own 2 feet.  This is your moment.  Rise up to the challenge.  Actually read the answers we're trying to give you.  It's perfectly fine to ask more questions, but at the very least, demonstrate in your next question that you at least made even a small effort to read this info.


just download latest version of arduino FHT library from below website


Also there's something called FHT Data Visualizer in Processing or Pure Data. Install processing and simulate. i dont know whether is that works. because im also trying to analyze frequency spectrum but processing give me bunch of errors. i dont know how to simulate them properly


I'm also curious about this Matlab code?  Did you personally write it?  Can it be shared?


but processing give me bunch of errors.
Crystal ball warning.
Anyone got one, mine is broken and this guy is asking us to look at error messages that are only on his computer.


Mar 20, 2018, 04:55 pm Last Edit: Mar 20, 2018, 04:58 pm by jlsilicon
Question here,

Earlier, I saw  reference to the ArduinoFFT Library.

*** Why can't this be used for Speech Recognition ?

- I would like to move this to the ArduinoDue - to improve the Resolution / accuracy.

I have tried the uSpeech code for the Arduino, which seems somewhat accurate (maybe 65%-80%).
 But looking over the code, it only uses the sums of the Samples Difference to Calculate vowels ...
- pretty accurate for the Alg type !

But, I want better, say reading mixed frequencies (a Vowel normally is a mix of 3 frequencies).

Go Up