I am working on a project that would benefit from simple voice recognition, "yes", "no", and numbers from 1-9.
Everything I see in the forum history is either more than 4-years old or ridiculously difficult to implement. The project can't work if a wake word is needed.
Is anyone aware of a simple to implement voice recognition that might meet my needs?
Not on an Arduino, unless you employ Siri or something like that to help out. Voice recognition is not simple, especially if you don't want to limit it to a single voice.
Not using a "wake word" will give many more false positives and unexpected reactions from whatever voice recognition system you want to employ.
I knew that there would have to be a recognition shield of some sort, and there are a few out there with quite a price range. I have WiFi in the project so using a cloud program would be OK as well.
False positives are not an issue. The project is an escape room game where there is a telephone. The players may dial a phone number to get a recorded clue. The VR would be used to detect the response when the player is asked a series of yes-no questions and possibly a number, "one" to "nine". The next recording heard would be decided whether the response is "yes", "no" or anything else.
If you are aware of a cloud-based speech-to-text service that has a relatively simple API and doesn't require a wake word, I would appreciate learning about it.
For this I think you better look at a Raspberry Pi kind of computer. Doesn't cost that much more than an Arduino but has lots more horsepower. Arduinos are just not suited for that kind of applications. Good chance that there's software available that can do just this, as phone voice response is a very common application. Years ago I heard about open source Linux based implementations of this kind of voice response systems.
The activation signal could be the phone being taken off the hook, or the correct number dialed.
Or if you want to use the Arduino: no voice, but I think DTMF decoding is within the abilities of Arduinos, as is playing pre-recorded messages (through e.g. the DF Player Mini).
I'l ask on the Raspberry forums, but I really prefer Arduino and C++.
My project is to make an old dial phone work. I already have the dialing, incoming calls, ringing the bell and the audio for dozens of prerecorded responses. My wife asked if I could tell when the person on the phone says "yes" or "no" so that the Y/N response would decide which recording was played next.
Challenge accepted.
I would like to fit everything inside the original phone case. I have inside the phone, the ring generator circuit board, the Wemos D1 Mini that is the main brains of the project, and an Uno which carries the Adafruit Audio Shield. I might be able to get a Pi Zero inside, but that's going to be tight.
I do remember having heard of modules that can do that - but only for a specific voice or voices, comparing the received sound to a known sample. You no doubt want to do the same for different voices, which is a lot harder. Siri and Alexa etc. manage to do quite OK - until you have too much of an accent, then even those supercomputers can't make sense of it any more.
Then think of in how many different ways you can actually pronounce "yes" and "no"! All giving different meaning to a human listener (surprise, uncertainty, excitement, etc) but making it that much harder for a computer to understand.
Good luck making it work, it'd be quite an achievement.
In the meantime to get your escape room game to work, DTMF is much easier. Use a MT8870 decoder IC and you're in business.
wvmarle:
I do remember having heard of modules that can do that - but only for a specific voice or voices, comparing the received sound to a known sample. You no doubt want to do the same for different voices, which is a lot harder. Siri and Alexa etc. manage to do quite OK - until you have too much of an accent, then even those supercomputers can't make sense of it any more.
Then think of in how many different ways you can actually pronounce "yes" and "no"! All giving different meaning to a human listener (surprise, uncertainty, excitement, etc) but making it that much harder for a computer to understand.
Good luck making it work, it'd be quite an achievement.
In the meantime to get your escape room game to work, DTMF is much easier. Use a MT8870 decoder IC and you're in business.
I can't use DTMF since the game is set in the 1960's.
The main problem with online voice recognition is they all require a wake word- again defeating the realism. I did find an fft library for the Uno, but because the git has no updates for several years, I am reluctant to try it.
I did consider a purpose-built voice recognition chip, but they don't have "yes" or "no" preprogrammed.
My backup plan is to simply detect when the person says anything by routing the microphone to an op amp then to the Arduino analog input, and respond appropriately. We just have to script the questions so that it doesn't matter what the player says.
If you come across something that might work, please let me know.
I can't use DTMF since the game is set in the 1960's
DTMF was introduced in 1963. It was the basis of phone frequking where people rang up strangers in other countries without charge by playing tones down the line.
An FFT is not going to help you with this.
You can prerecord the wake up word and add it to a recording of the phone user before sending it off to the cloud.
An old-fashioned dial phone can be interfaced with an Arduino without extra hardware. It's just a pulse train.
If your target audience is young, it adds a nice extra puzzle element: figuring out how to use it!
Though I don't think voice response existed back then, or if it did was in common use. I do remember having to use the "correct" phone dealing with such systems: they would only work with the DTMF ones...