Voice Recognition Techniques

Hello,

I would like to start a project to set up an Arduino dedicated to voice recognition. I would like to do this without having to buy any special voice recognition shields or hardware, save for a microphone.

I'm just wondering if this is possible to do just using an Arduino analog input and a microphone?

And if so, can anyone point to any tutorials or library examples along these lines.

Thank you.

I don't think that's possible. Even with a shield, from what I understand it's voice recognition capabilities are very limited.

The amount of processing (and memory) required depends on how much vocabulary you need (it's not too hard if you just need to distinguish between "Yes" or "No", or the numbers one through ten) and it's easier if it doesn't need to be speaker independent.

Dragon Naturally Speaking requires the power of a PC or Mac, and Siri uses powerful servers on the Net.

And, there's the issue of accuracy. No speech recognition system is 100% accurate. Even humans do have 100% accurate speech recognition, and computers are worse.

And on the analog side, you'll need a preamp for the microphone.

Voice recognition :- recognising WHO is speaking.
Speech recognition:- recognising WHAT is being said.
The two terms are not interchangeable.

Henry_Best:
Voice recognition :- recognising WHO is speaking.
Speech recognition:- recognising WHAT is being said.
The two terms are not interchangeable.

In that case, I'm more interested in speech recognition. In fact, it would be great if the program worked for everyone. It's my understanding that often times speech recognition software only understands the person it was trained to listen to.

The main thing I would like to do is learn how to program speech recognition. Actually I guess this doesn't even need to be specific to the Arduino. If I could just get a generic idea of how to write a speech recognition algorithm I could adapt it to the Arduino.

Mainly I was wondering if anyone has already done this with an Arduino?

I saw a view videos that uses a $50 "Voice Recognition Shield" for Arduino. (evidently the commercial industry is calling these things "voice recognition" when they should be calling them "speech recognition") Although they probably are fairly sensitive to the voice that trains them. So they might be voice specific even if they can't really recognize different people's voices.

In any case, here's the video.

Voice Recognition (VRBot/EasyVR) + Arduino P1

This is pretty interesting, but I'm wondering if this can be done without the $50 shield.

Here's the shield, and they are calling this a "Voice Recognition Shield"

EasyVR Shield 3.0 - Voice Recognition Shield

I would like to do something along these lines, except with software only, just using the Arduino analog input pins, a microphone, and preamp.

I would be happy if I could just learn how to write a program to recognize a single word or phrase. Once I see how that's done I could expand it from there myself.

Mainly I was wondering if anyone has already done this with an Arduino?

I did it, with UNO and Leonardo. Mic + preamplifier ( gain 100 - 250) + arduino. In case of Leonardo, you don't need an amplifier, internal PGA works beautifully well with a regular condenser mic.
Having 2k memory, UNO capable to store 1 sec. audio track, compressed output of the FFT, than store to EEPROM or run cross-correlation with already saved track. 1 word only w/o external SD cards or other storage. Voice recognition.
Speech is "deteriorated" version of voice recog., so more compression could be done to store ~10 words in arduino itself.
There is a web-archive of the project, original post is lost:

Thanks Magician, this is exactly the type of thing I'm looking for. Do you still have your Arduino sketch? It's no longer available in the archives. Could you post it here?

Certainly.

https://drive.google.com/file/d/0Bw4tXXvyWtFVLUFyMmdLb2RBam8/view?usp=sharing

https://drive.google.com/file/d/0Bw4tXXvyWtFVaktLMExzeUZnU1U/view?usp=sharing

You need only 1 mic, this drawings from another project "sound localization".


https://drive.google.com/file/d/0Bw4tXXvyWtFVT2tERVkwUEVBUHc/view?usp=sharing

https://drive.google.com/file/d/0Bw4tXXvyWtFVOUVaRk5NbS1hVzg/view?usp=sharing

AFAIR, spectrogram is what my laptop says "test left" - linux ubuntu OS test speaker phrase

https://drive.google.com/file/d/0Bw4tXXvyWtFVWjU5aUlWdXphY2M/view?usp=sharing
https://drive.google.com/file/d/0Bw4tXXvyWtFVWjU5aUlWdXphY2M/view?usp=sharing

In fact, it would be great if the program worked for everyone. It's my understanding that often times speech recognition software only understands the person it was trained to listen to.

And how much vocabulary do you need?

The main thing I would like to do is learn how to program speech recognition. Actually I guess this doesn't even need to be specific to the Arduino. If I could just get a generic idea of how to write a speech recognition algorithm

I believe that's an advanced topic, but you can probably search the Net or get a book. If you took a university class in speech recognition it would probably be a 2rd or 4th year class, if not a postgraduate class. :o

It should be a LOT easier on a computer since you already have a soundcard, and if you're on a laptop you've already got a microphone. Or, you can read from a WAV or MP3 file, etc., and you don't need a microphone (and you won't have to talk to the computer over-and-over during development). And, you'll have lots more memory & processing power, and a video monitor to display the text.

It's a little more effort to install and configure an IDE/compiler on your computer, but that's just "overhead" that you only have to do once. GUI programming adds another level of complexity, but you don't have to do GUI just because you're on a PC or Mac.

P.S.
Although you probably won't be writing your own FFT/DFT library or anything like that, it might be good to have some digital signal processing under your belt. There is a good FREE online DSP book called [u]The Scientist and Engineer's Guide to Digital Signal Processing By Steven W. Smith, Ph.D.[/u]

@DVDdoug, Thanks for the suggestions. I actually have several IDEs on my computer and I've been searching for examples. Typically what I find are examples that already use other software or libraries that handle the actual voice or speech recognition. For example, "using speech.RecognizationEngine;". And then they just explain how to use that class.

I actually have quite a bit of background in digital and analogy signal processing. But I've never applied this knowledge specifically to speech recognition. I just bought a bunch of Arduino boards, and I thought I would dedicate one on my robot specifically for speech recognition. So that's the main reason I posted here on the Arduino site.

@Magician, Thanks for posting your Arduino sketch. That's exactly the type of approach I had in mind. I did get some compile errors when I tried to compile the code though. This might be because I'm using the Arduino 1.6.5 IDE. I might try downloading the 1.0.1 version and see if it will compile with that.

The errors I'm getting on the 1.6.5 version are as follows:

Build options changed, rebuilding all
VOR_remix_3f:40: error: 'prog_int16_t' does not name a type
VOR_remix_3f:51: error: 'prog_int16_t' does not name a type
In file included from VOR_remix_3f.ino:27:0:
VOR_remix_3f.ino: In function 'void fft_radix4_I(int*, int*, int)':
VOR_remix_3f:227: error: 'Sinewave' was not declared in this scope
VOR_remix_3f:228: error: 'Sinewave' was not declared in this scope
VOR_remix_3f:229: error: 'Sinewave' was not declared in this scope
VOR_remix_3f:231: error: 'Sinewave' was not declared in this scope
VOR_remix_3f:232: error: 'Sinewave' was not declared in this scope
VOR_remix_3f:233: error: 'Sinewave' was not declared in this scope
VOR_remix_3f.ino: In function 'void loop()':
VOR_remix_3f:394: error: 'Anatoly' was not declared in this scope
VOR_remix_3f.ino:73:24: note: in definition of macro 'mult_shf_s16x16'
'prog_int16_t' does not name a type

Tempora mutantur, as they say.
https://drive.google.com/file/d/0Bw4tXXvyWtFVWklIVWZoQm03WEU/view?usp=sharing

https://drive.google.com/file/d/0Bw4tXXvyWtFVWklIVWZoQm03WEU/view?usp=sharing

Try. You will see FFT Radix-4 algorithm in the main sketch, since than I write a library, faster and better SplitRadix. But try first, and see if you can make 90% match. Later on it would make sense to implement a library

VOR_remix_3f.ino (22 KB)