Arduino Forum

Using Arduino => Audio => Topic started by: andrecr03 on Apr 05, 2017, 05:54 am

Title: Speech/Voice Recognition
Post by: andrecr03 on Apr 05, 2017, 05:54 am
Hello everyone!

My name is André, and I need some help to set up a project, using Arduino, to recognize sounds, voices and, more specifically, speech. And I need to do this without any shields, or servers! Just a frequency analysis, using FFT for example.
For my class project, I have to do something like this and compare the efficiency with a system using a shield, like EasyVR 3.0.
I've been searching around and some members have already posted something like that, however the files on the links were no longer available.
I have to do this until the end of the semester, so I'm in a hurry! If any of you could help me, or send me examples, algorithms for studying, etc, I would be so much grateful. If you have working sketches would help a lot too, since I could study the code line by line.

Thank you very much in advance! Please help me :(

André
Title: Re: Speech/Voice Recognition
Post by: andrecr03 on Apr 05, 2017, 06:12 am
One more thing!

Without the shield or the server, it can be a simple digital signal processing, just to make a LED to light up, for example; I say "green" and a green LED lights up, that would suffice with the use of Arduino stand-alone. Obviously, if a sketch could do something more it would be better, but just to facilitate the job for you guys...
Title: Speech/Voice Recognition
Post by: andrecr03 on Apr 05, 2017, 06:22 pm
Hello everyone!

My name is André, and I need some help to set up a project, using Arduino, to recognize sounds, voices and, more specifically, speech. And I need to do this without any shields, or servers! Just a frequency analysis, using FFT for example.
For my class project, I have to do something like this and compare the efficiency with a system using a shield, like EasyVR 3.0.
I've been searching around and some members have already posted something like that, however the files on the links were no longer available.
I have to do this until the end of the semester, so I'm in a hurry! If any of you could help me, or send me examples, algorithms for studying, etc, I would be so much grateful. If you have working sketches would help a lot too, since I could study the code line by line.

One more thing!

Without the shield or the server, it can be a simple digital signal processing, just to make a LED to light up, for example; I say "green" and a green LED lights up, that would suffice with the use of Arduino stand-alone. Obviously, if a sketch could do something more it would be better, but just to facilitate the job for you guys...

Thank you very much in advance! Please help me :(

André

Title: Re: Speech/Voice Recognition
Post by: ard_newbie on Apr 05, 2017, 06:29 pm

You can see this tutorial for speech/voice recognition :

https://create.arduino.cc/projecthub/msb4180/speech-recognition-and-synthesis-with-arduino-2f0363 (https://create.arduino.cc/projecthub/msb4180/speech-recognition-and-synthesis-with-arduino-2f0363)
Title: Re: Speech/Voice Recognition
Post by: andrecr03 on Apr 05, 2017, 06:38 pm
Oh, Sorry, I have to do something without the use of servers and shields. Only with the Arduino. I would like to do some FFT and spectrum analysis.
Do you have some example codes with FFT or something like that?
Title: Re: Speech/Voice Recognition
Post by: jremington on Apr 05, 2017, 07:10 pm
The FFT won't help you with voice recognition, but if you want to learn something about it, check out http://wiki.openmusiclabs.com/wiki/ArduinoFFT (http://wiki.openmusiclabs.com/wiki/ArduinoFFT).

Arduino is totally unsuited for voice recognition.
Title: Re: Speech/Voice Recognition
Post by: andrecr03 on Apr 05, 2017, 07:16 pm
I have already seen something like that around here, but, as I said, the files were no longer available on the links... I just wanted to do a simple recognition system, to perform an analysis on frequency peaks of a voice signal.
Title: Re: Speech/Voice Recognition
Post by: andrecr03 on Apr 05, 2017, 07:34 pm
More specifically on this link: http://forum.arduino.cc/index.php?topic=352777.0
Title: Re: Speech/Voice Recognition
Post by: Coding Badly on Apr 05, 2017, 08:35 pm

@andrecr03, do not cross-post.  Threads merged.

Title: Re: Speech/Voice Recognition
Post by: pjrc on Apr 05, 2017, 08:53 pm
You haven't even said *which* Arduino you're using.  There are many different Arduino boards, and many more that are not officially Arduino but are Arduino compatible.

This is important, because the many boards vary greatly in capability.  Arduino Uno can just barely manage even a small FFT, and probably can't do any significant data processing without considerable blind (or "deaf") time between each set of data collected.  Boards like Arduino Due are much more powerful, but still there can be tricky matters of software if you wish to do analysis while still collecting the next imcoming data so you don't have gaps between each FFT.
Title: Re: Speech/Voice Recognition
Post by: Grumpy_Mike on Apr 05, 2017, 10:32 pm
Quote
I just wanted to do a simple recognition system,
Sorry there is no such thing. They are all complex.

Quote
I say "green" and a green LED lights up,
and what if you say "great", "greedy", "margarine" and the green light comes on?
Basically an FFT is only the very first step, you need to take lots of FFTs for the duration of the sound. Then you have to do a search of all the template sounds you have stored and see which one matches most closely. Then you have to look at the probability that the close match you have is actual a word you want.
The whole thing works on probabilities. There is no such thing as a perfect speech recognition system.
Title: Re: Speech/Voice Recognition
Post by: andrecr03 on Apr 06, 2017, 01:13 am
Sorry, my mistake. I'm using an Arduino Due.
I know there isn't a simple speech recognition, but what I mean is that I want to make a system that can analyze spectrums and choose actions..
Title: Re: Speech/Voice Recognition
Post by: pjrc on Apr 06, 2017, 01:22 am
Of course it won't be perfect Mike.  But Apple, Amazon and others have figured out how to do this pretty well, at least for English language with USA dialect.  Admittedly they are using huge server farms, so perhaps the methods are impractical for microcontrollers?  Or maybe not?

I'm particularly interested in this for my Teensy Audio Library.... since we already have continuous 50% overlapped windowed 1024 point FFTs running on Teensy with quite a lot of the CPU time still available to actually do that pattern matching (especially on the newer, faster Teensy 3.6).  In the coming years, we're going to get more and more powerful chips, since today's fastest microcontrollers are still mostly only 90 nm silicon.

Are there good public references for how the FFTs are distilled to smaller data sets, and those then matched to patterns?  Or is that sort of knowledge only existing as the "secret sauce" at Apple, Google & Amazon?
Title: Re: Speech/Voice Recognition
Post by: Grumpy_Mike on Apr 06, 2017, 10:19 am
There are lots of template matching algorithms in the public domain, they are mainly based on correlation. Look at the gesture controlled stuff for a simple example.

Quote
Apple, Amazon and others have figured out how to do this pretty well, at least for English language with USA dialect.
That is the problem I am actually English with a north Manchester accent, it is not at all strong but voice recognition is annoying imprecise. Anyone with a strong accent doesn't stand a chance. We don't all sound like Dick van Dike in the movies, in fact none of us do.
Title: Re: Speech/Voice Recognition
Post by: andrecr03 on Apr 06, 2017, 06:02 pm
Please, I just have to do a word recognition algorithm. It doesn't have to be speaker independent. It can be only for my voice!
I just want to know how can I take a voice signal from a microphone connected to the board (a KY-037 microphone) and do some signal processing on it.
To sum up, are there some example algorithms I can use, modifying the code?
I would like to know and use libraries and functions, like to do a FFT, show a spectrum, start listening the microphone, etc.
I don't know any of that
Title: Re: Speech/Voice Recognition
Post by: Grumpy_Mike on Apr 06, 2017, 09:04 pm
Can you explain what is wrong with the link in reply #3?
Title: Re: Speech/Voice Recognition
Post by: jremington on Apr 06, 2017, 11:41 pm
See link in reply #5 for FFT code.
Title: Re: Speech/Voice Recognition
Post by: andrecr03 on Apr 07, 2017, 12:59 am
Can you explain what is wrong with the link in reply #3?
Please, correct me if I'm wrong. He suggests me to use a server (bitvoicer), but I thought that when you use a server, you don't really develop the voice recognition system/algorithm, and just train words and the programmer only decides how to use each recorded command, like the EasyVR.
For this project, to be graded, I have to do a voice recognition algorithm with Arduino, even if it's simple, but I really have to do the windowing, the FFT, etc, and the block diagram, along with the code, in order to perform the recognition.
Please, correct me if it's needed. As far as I know, the server only provides you processing power, and not DSP techniques and libraries, as I requested.

Thanks for all the help until now.
Title: Re: Speech/Voice Recognition
Post by: Grumpy_Mike on Apr 07, 2017, 09:34 am
All simple voice recognition systems require training. The complex ones do as well, it is just that the training is different.
I think you have an impossible project. There are no easy answers, no magic algorithm.

Basically you have to input a sample of sound. Top and tail it, that is remove the start and end bits leaving just the word. Then extract parameters from the sound. I don't think that an FFT is a suitable prameter but it depends on what sort of system you want, an unspecific voice would not depend on any way on the frequency profile.
Finally you need to compare what you have with a previously prepared template to see what probability you have for a match.

The parameter extraction part defines what sort of system you have. As I understand it, the more complex system attempt to extract fricative sections from the speech, and then look up the combinations of these in a dictionary.
Title: Re: Speech/Voice Recognition
Post by: pjrc on Apr 07, 2017, 11:58 am
I would like to know and use libraries and functions, like to do a FFT, show a spectrum, start listening the microphone, etc.
About a year ago I wrote a very detailed tutorial and taught a couple hands-on workshops which included this exact subject.  But I have some good news and some bad news for you.

First, the bad news: this tutorial and the library code only works on Teensy 3.x.  It will not work on Arduino Due.  The code makes heavy use of DMA and special peripherals not present on Due.  It also uses the Cortex-M4 DSP extensions, which are not present in the Cortex-M3 processor on Arduino Due.  Maybe this material can offer some very minor help even if you don't have compatible hardware, but to actually use it, you will need get a Teensy board.  (full disclosure: my company makes these boards... so my opinion is biased, but I really do wish to help you if I can)

I also do not know how to use the FFT output for speech recognition.  I'm writing this message for you in hopes you might share whatever you learn.  I can only help with the part you asked above... how to get the audio data and perform FFTs, and do so in a way that other code can easily work with the data while more audio is captured and analyzed by the FFT without gaps.

The good news: I wrote a very detailed 31 page tutorial manual.  You can get it here:

https://www.pjrc.com/store/audio_tutorial_kit.html (https://www.pjrc.com/store/audio_tutorial_kit.html)

Alysia and I also went to the trouble to shoot & edit a complete 48 minute walkthrough video.  It shows all the tutorial steps.  Scroll down at that page for the video.  Of course, actually doing the tutorial yourself with the real hardware is essential, but if you get stuck and the words & pictures of the 31 page manual aren't clear, hopefully the video helps.

Again, this code only works on Teensy 3.2, Teensy 3.5 and Teensy 3.6.  The tutorials use this audio shield (https://www.pjrc.com/store/teensy3_audio.html).  There are other ways to get signals in and out, and after you've read or watched the tutorial and if you look around the options offered in the design tool (part 2 of the tutorial) you'll see other ways to input & output signals.  But as a practical matter, for a microphone that shield is probably the best way to get started.  Notice the part of the tutorial where the mic gain is software adjustable....

If you read or watch part 3-2 of the tutorial, hopefully you can see how this works for FFT data.  The audio library takes care of acquiring the signal and computing the FFT, which you can then use in your program.

Again, how you'll use the FFT data to recognize works is beyond my knowledge.  I am curious.  If this info helps you get started, and if you learn anything useful that could help me or others, I sincerely hope you'll share.  Maybe you'll even participate here on the forum and occasionally help them, as we're trying to help you?
Title: Re: Speech/Voice Recognition
Post by: andrecr03 on Apr 08, 2017, 12:09 am
Thank you very much, Paul and Mike!
Actually, I have already done an algorithm like that on MATLAB, but I cannot use it on this project, not even along with Arduino. It was based on FFT, the user says something, and after some FFT were done on the signal, the peaks of frequency were isolated, then compared with a pattern that MATLAB already knew.

However, since I'm not able to do it using MATLAB, I'm looking for tutorials, examples, etc, that could guide me through this project. How can I say to the board how to start listening to the microphone? Is there a function to do that? How can I store the sound data to be compared to something later? How can I do FFT? Again, what is the correct function to do that? Is there a library I have to install to do what I'm planning to do?
Did you get what I mean? A tutorial for things like that.

I really need help with those topics. In fact, I should have done those questions in first place.. :/
Title: Re: Speech/Voice Recognition
Post by: Grumpy_Mike on Apr 08, 2017, 12:23 am
Reply #5 had a link to the FFT library.
It has examples of how to use it.

To start listening you start recording the samples from the analogue input.
You record for the number of samples you want to pass to the FFT.
Then you point the data to the FFT function and do it. The result is that your sample data buffer now contains the FFT of your samples. Typical is 1028 samples so it is not very long.
Title: Re: Speech/Voice Recognition
Post by: andrecr03 on Apr 08, 2017, 12:34 am
To start listening you start recording the samples from the analogue input.
You record for the number of samples you want to pass to the FFT.
How can I do that?
There is a function to record data in the library on reply #5?
Title: Re: Speech/Voice Recognition
Post by: jremington on Apr 08, 2017, 05:53 am
Quote
There is a function to record data in the library on reply #5?
Is it too much trouble to look at that link by yourself?

If so, let us know and forum members will be glad to do so, and summarize the findings for you.
Title: Re: Speech/Voice Recognition
Post by: Grumpy_Mike on Apr 08, 2017, 06:41 am
There is a function to record data in the library on reply #5?
Yes. It is in the example code.
Title: Re: Speech/Voice Recognition
Post by: pjrc on Apr 09, 2017, 03:27 pm
How can I say to the board how to start listening to the microphone? Is there a function to do that?
Seems you did not read the tutorial PDF (https://www.pjrc.com/store/audio_tutorial_kit.html) I wrote?  This is explained in "Part 2-4: Using the Microphone" on page 16.

Ok, maybe reading page 16 is too much work?  It's also in the video, starting at 15 minutes and 54 seconds.  Here is a direct link that is supposed to start playing the video at 15:54.

https://youtu.be/wqt55OAabVs?t=15m54s (https://youtu.be/wqt55OAabVs?t=15m54s)

Can you at least just click this link and watch the video for a couple minutes?  Is that too hard?


Quote
How can I do FFT?
Again, also in the tutorial, part 3.2.

Look, if you weren't even able to read page 16 or watch that part of the video, how do you hope to read pages 24 to 29.  That's six times as much than reading only page 16.

Seriously, there comes a time when everyone needs to stand on their own 2 feet.  This is your moment.  Rise up to the challenge.  Actually read the answers we're trying to give you.  It's perfectly fine to ask more questions, but at the very least, demonstrate in your next question that you at least made even a small effort to read this info.
Title: Re: Speech/Voice Recognition
Post by: Umesh21 on Apr 09, 2017, 03:42 pm
just download latest version of arduino FHT library from below website

http://wiki.openmusiclabs.com/wiki/ArduinoFHT

Also there's something called FHT Data Visualizer in Processing or Pure Data. Install processing and simulate. i dont know whether is that works. because im also trying to analyze frequency spectrum but processing give me bunch of errors. i dont know how to simulate them properly
Title: Re: Speech/Voice Recognition
Post by: pjrc on Apr 09, 2017, 04:03 pm
I'm also curious about this Matlab code?  Did you personally write it?  Can it be shared?
Title: Re: Speech/Voice Recognition
Post by: Grumpy_Mike on Apr 09, 2017, 04:30 pm
Quote
but processing give me bunch of errors.
Crystal ball warning.
Anyone got one, mine is broken and this guy is asking us to look at error messages that are only on his computer.
Title: Re: Speech/Voice Recognition
Post by: jlsilicon on Mar 20, 2018, 04:55 pm
Question here,

Earlier, I saw  reference to the ArduinoFFT Library.

*** Why can't this be used for Speech Recognition ?

- I would like to move this to the ArduinoDue - to improve the Resolution / accuracy.

I have tried the uSpeech code for the Arduino, which seems somewhat accurate (maybe 65%-80%).
 But looking over the code, it only uses the sums of the Samples Difference to Calculate vowels ...
- pretty accurate for the Alg type !

But, I want better, say reading mixed frequencies (a Vowel normally is a mix of 3 frequencies).
Title: Re: Speech/Voice Recognition
Post by: ard_newbie on Mar 20, 2018, 05:03 pm

Something like FFT for Arduino DUE ?

https://github.com/dujianyi/ardFFT (https://github.com/dujianyi/ardFFT)
Title: Re: Speech/Voice Recognition
Post by: Grumpy_Mike on Mar 20, 2018, 06:46 pm
Quote
*** Why can't this be used for Speech Recognition ?
Because speech recognition is not easy.

Quote
I have tried the uSpeech code for the Arduino, which seems somewhat accurate (maybe 65%-80%).
I doubt it is even that good but even if it is then that is rubbish as regards usability.

Quote
say reading mixed frequencies (a Vowel normally is a mix of 3 frequencies).
Recognising is not a simple matter of recognising frequencies. The mix of frequencies constantly changes over the duration of the word, you have to track these changes and match them to a template. This requires lots of memory for the many FFTs you have to take and then lots of memory for the templates to correlate with the input. Also in order not to miss anything you should sample the whole word and then brake it up into small chunks to do the FFTs on.
Title: Re: Speech/Voice Recognition
Post by: kapser on Oct 11, 2018, 08:05 am
Sorry to kick this up, but I was doing some research in voice recognition on Arduino like platforms.

Since Paul seemed especially interested I thought I would chime in with something new I came across.

There is a keyword spotting library/demo released for ARM boards (like Due and Teensy).


There is an example for the Freedom K64 from NXP, but I think this might run on a Teensy 3.5 as well (since it also seems to be based on the K64) (in the Deployment folder).

I'm not 100% sure if I will take this road (it merely a concept idea, not a concrete project I'm working on), but I thought at least to share this in this topic.

https://github.com/ARM-software/ML-KWS-for-MCU (https://github.com/ARM-software/ML-KWS-for-MCU)

Commands supported by this build:
_silence_, _unknown_, yes, no, up, down, left, right, on, off, stop, go
Title: Re: Speech/Voice Recognition
Post by: Grumpy_Mike on Oct 12, 2018, 05:42 pm
I can't help but notice that those examples are all in Python. The Arduino world uses C/C++.
Title: Re: Speech/Voice Recognition
Post by: kapser on Oct 12, 2018, 07:45 pm
Like I said, the 'K64' code is in the Deployment folder. The Python is to train the model (Python is the most used language in Machine Learning).

Here the direct link to the C/C++ code example:
https://github.com/ARM-software/ML-KWS-for-MCU/tree/master/Deployment/Examples/simple_test (https://github.com/ARM-software/ML-KWS-for-MCU/tree/master/Deployment/Examples/simple_test)
Title: Re: Speech/Voice Recognition
Post by: martinius96 on Dec 24, 2018, 12:09 pm
I have made speech recognition service for my projects.
You can try it on: https://arduino.php5.sk/PHP_en/ (https://arduino.php5.sk/PHP_en/)
Working only under CHROME BROWSER!