Voice recognition library - It works!

First of forgive me if this is the wrong place to post about this but I am relatively new to the forum.
For the past 3 weeks I have been working on a voice recognition library. Its come to a stage where it can be used more or less to detect words from a small vocabulary set (about say 10). It is not perfect yet but can differentiate between left,right, up and down.

The library also contains techniques to recognize phonemes. The way it works is simple, 4 bandpass filters are created, from each of these filters the power is collected using an absolute summation or absolute integral. Then the amount the complexity of the signal in each of these filters is determined using an absolute differential divided by an absolute integral. After this to generate “fingerprint” I get the absolute integral of filter[n]-mean/variance of the powers and the same for the complexity. This produces a 5x2 matrix which is compared against a model matrix.

End of math talk
Code is on github as I stated there is very less (no) documentation yet and I have to work on training samples.
I have not included my training samples yet in the github repo but they should be up. http://arjo129.github.com/uSpeech/

What about the hardware, any hints?

Cheers, Kari

What is that:

the amount the complexity of the signal in each of these filters

Here is a post how to do voice recognition based on FFT filtering: http://coolarduino.wordpress.com/2012/01/24/arduino-project-next-in-a-series-fft-and-arduino/

nice job :)

i took a look at the files, and it seems you did not think about using for loops, did you xp

wince a lot of bytes can be saved by doing for-loops instead of 20 times the same. (just a comment)

but it is fantastich you did this, i love to see things like these comming up, and i can believe you are happy to get it working :)

Thanks guys! Actually the point of it was to bypass the FFT completely. I'm working on extending it to phonemes like a, e, i,o,u allowing for partially accurate Text to speech. I will have to implement more filters. About loops, there are places where you will notice indices are not going up in an orderly fashion, while I could use map(), there is more than enogh space on the Flash so why not save memory and computation time :astonished:? Sampling needs to be very accurate anding a loop adds a couple of microseconds, not desirable for the filters. As for hardwares, use a condenser mic. I'm also working on a handbook, but I think I've got some clean up to do to the API first so people can use it. I think I've hit upon something so that I can bypass machine learning and make its recognition more user friendly. Thanks for the positive response!

i think you misunderstood my comment :slight_smile:
what i meant with replacing with for loop is that this:

void uspeech::sample(){
  arr[0] = analogRead(pin)-calib;
  arr[1] = analogRead(pin)-calib;
  arr[2] = analogRead(pin)-calib;
  arr[3] = analogRead(pin)-calib;
  arr[4] = analogRead(pin)-calib;
  arr[5] = analogRead(pin)-calib;
  arr[6] = analogRead(pin)-calib;
  arr[7] = analogRead(pin)-calib;
  arr[8] = analogRead(pin)-calib;
  arr[9] = analogRead(pin)-calib;
  arr[10] = analogRead(pin)-calib;
  arr[11] = analogRead(pin)-calib;
  arr[12] = analogRead(pin)-calib;
  arr[13] = analogRead(pin)-calib;
  arr[14] = analogRead(pin)-calib;
  arr[15] = analogRead(pin)-calib;
  arr[16] = analogRead(pin)-calib;
  arr[17] = analogRead(pin)-calib;
  arr[18] = analogRead(pin)-calib;
  arr[19] = analogRead(pin)-calib;
  arr[20] = analogRead(pin)-calib;
  arr[21] = analogRead(pin)-calib;
  arr[22] = analogRead(pin)-calib;
  arr[23] = analogRead(pin)-calib;
  arr[24] = analogRead(pin)-calib;
  arr[25] = analogRead(pin)-calib;
  arr[26] = analogRead(pin)-calib;
  arr[27] = analogRead(pin)-calib;
  arr[28] = analogRead(pin)-calib;
  arr[29] = analogRead(pin)-calib;
  arr[30] = analogRead(pin)-calib;
  arr[31] = analogRead(pin)-calib;
  arr[32] = analogRead(pin)-calib;

can be easily replaced with this:

void uspeech::sample()
  for(uint8_t i=0; i<33; i++)
    arr[i] = analogRead(pin)-calib;

that’ll save a lot of bytes, and does not use anything more ram. well, it uses one byte more to store ‘i’, but that gets free’d afterwards, so it results in the same :slight_smile:

@arjo129: Can you explain what you mean by "It works!".


I have overhauled the algorythm completely the new algorythm is defined here: http://arjo129.github.com/uSpeech/. In the basic core of the library (the latest commit) there is a phoneme based recognition system with some helper functions to help convert them to strings and match them. As stated in the website, correct phoneme is 30%-40% but with the helper functions up to 80% accuracy is achieved for a vocabulary of 5 words. One of the things is that the strings have to be converted by the programmer to corresponding phoneme characters. Docs are still under way.

I have uploaded a pdf tutorial to the github downloads section entailing the use of the µSpeech library. Feel free to take a look and report bugs.