Pages: [1]   Go Down
Author Topic: Voice recognition library - It works!  (Read 2863 times)
0 Members and 1 Guest are viewing this topic.
Offline Offline
Newbie
*
Karma: 0
Posts: 8
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

First of forgive me if this is the wrong place to post about this but I am relatively new to the forum.
For the past 3 weeks I have been working on a voice recognition library. Its come to a stage where it can be used more or less to detect words from a small vocabulary set (about say 10). It is not perfect yet but can differentiate between left,right, up and down.

WARNING! COMPLEX MATH TALK AHEAD:
The library also contains techniques to recognize phonemes. The way it works is simple, 4 bandpass filters are created, from each of these filters the power is collected using an absolute summation or absolute integral. Then the amount the complexity of the signal in each of these filters is determined using an absolute differential divided by an absolute integral. After this to generate "fingerprint" I get the absolute integral of filter[n]-mean/variance of the powers and the same for the complexity. This produces a 5x2 matrix which is compared against a model matrix.

End of math talk
Code is on github as I stated there is very less (no) documentation yet and I have to work on training samples.
I have not included my training samples yet in the github repo but they should be up. http://arjo129.github.com/uSpeech/
Logged

Espoo, Finland
Offline Offline
God Member
*****
Karma: 7
Posts: 586
"Oops, try again..."
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

What about the hardware, any hints?

Cheers,
Kari
Logged


The only law for me; Ohms Law: U=R*I       P=U*I
Note to self: "Damn! Why don't you just fix it!!!"

Montreal
Offline Offline
Faraday Member
**
Karma: 27
Posts: 2566
Per aspera ad astra.
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

What is that:
Quote
the amount the complexity of the signal in each of these filters
???
Here is a post how to do voice recognition based on FFT filtering: http://coolarduino.wordpress.com/2012/01/24/arduino-project-next-in-a-series-fft-and-arduino/
Logged

Belgium
Offline Offline
Full Member
***
Karma: 0
Posts: 187
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

nice job smiley

i took a look at the files, and it seems you did not think about using for loops, did you xp

wince a lot of bytes can be saved by doing for-loops instead of 20 times the same.
(just a comment)

but it is fantastich you did this, i love to see things like these comming up, and i can believe you are happy to get it working smiley
Logged


Offline Offline
Newbie
*
Karma: 0
Posts: 8
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

Thanks guys! Actually the point of it was to bypass the FFT completely. I'm working on extending it to phonemes like a, e, i,o,u allowing for partially accurate Text to speech. I will have to implement more filters. About loops, there are places where you will notice indices are not going up in an orderly fashion, while I could use map(), there is more than enogh space on the Flash so why not save memory and computation time  smiley-eek? Sampling needs to be very accurate anding a loop adds a couple of microseconds, not desirable for the filters. As for hardwares, use a condenser mic. I'm also working on a handbook, but I think I've got some clean up to do to the API first so people can use it. I think I've hit upon something so that I can bypass machine learning and make its recognition more user friendly. Thanks for the positive response!
Logged

Belgium
Offline Offline
Full Member
***
Karma: 0
Posts: 187
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

i think you misunderstood my comment smiley
what i meant with replacing with for loop is that this:

Code:
void uspeech::sample(){
  arr[0] = analogRead(pin)-calib;
  arr[1] = analogRead(pin)-calib;
  arr[2] = analogRead(pin)-calib;
  arr[3] = analogRead(pin)-calib;
  arr[4] = analogRead(pin)-calib;
  arr[5] = analogRead(pin)-calib;
  arr[6] = analogRead(pin)-calib;
  arr[7] = analogRead(pin)-calib;
  arr[8] = analogRead(pin)-calib;
  arr[9] = analogRead(pin)-calib;
  arr[10] = analogRead(pin)-calib;
  arr[11] = analogRead(pin)-calib;
  arr[12] = analogRead(pin)-calib;
  arr[13] = analogRead(pin)-calib;
  arr[14] = analogRead(pin)-calib;
  arr[15] = analogRead(pin)-calib;
  arr[16] = analogRead(pin)-calib;
  arr[17] = analogRead(pin)-calib;
  arr[18] = analogRead(pin)-calib;
  arr[19] = analogRead(pin)-calib;
  arr[20] = analogRead(pin)-calib;
  arr[21] = analogRead(pin)-calib;
  arr[22] = analogRead(pin)-calib;
  arr[23] = analogRead(pin)-calib;
  arr[24] = analogRead(pin)-calib;
  arr[25] = analogRead(pin)-calib;
  arr[26] = analogRead(pin)-calib;
  arr[27] = analogRead(pin)-calib;
  arr[28] = analogRead(pin)-calib;
  arr[29] = analogRead(pin)-calib;
  arr[30] = analogRead(pin)-calib;
  arr[31] = analogRead(pin)-calib;
  arr[32] = analogRead(pin)-calib;
}

can be easily replaced with this:

Code:
void uspeech::sample()
{
  for(uint8_t i=0; i<33; i++)
  {
    arr[i] = analogRead(pin)-calib;
  }
}

that'll save a lot of bytes, and does not use anything more ram. well, it uses one byte more to store 'i', but that gets free'd afterwards, so it results in the same smiley
Logged


Offline Offline
Edison Member
*
Karma: 43
Posts: 1552
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

@arjo129:
Can you explain what you mean by "It works!".

Pete
Logged

Where are the Nick Gammons of yesteryear?

Offline Offline
Newbie
*
Karma: 0
Posts: 8
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

I have overhauled the algorythm completely the new algorythm is defined here: http://arjo129.github.com/uSpeech/. In the basic core of the library (the latest commit) there is a phoneme based recognition system with some helper functions to help convert them to strings and match them. As stated in the website, correct phoneme is 30%-40% but with the helper functions up to 80% accuracy is achieved for a vocabulary of 5 words. One of the things is that the strings have to be converted by the programmer to corresponding phoneme characters. Docs are still under way.
Logged

Offline Offline
Newbie
*
Karma: 0
Posts: 8
View Profile
WWW
 Bigger Bigger  Smaller Smaller  Reset Reset

I have uploaded a pdf tutorial to the github downloads section entailing the use of the µSpeech library. Feel free to take a look and report bugs.
Logged

Pages: [1]   Go Up
Jump to: