One word speech recognition for any user

I want to make a puzzle box that opens when the password is spoken. All of the stuff I am finding on speech recognition seems to be talking about training the device, and relies on learning a particular persons voice. Is it feasible to teach it to recognize a word (just one word) but spoken by almost anyone?

Maybe. If you're able to just record the sound and offload it to a cloud service to deal with. Even then, look at how Alexa fails to hear what you say from time to time. On a standalone Arduino? Forget it.

[u]EasyVR[/u] has "26 built-in speaker-independent commands" so if you can use one those words it might work.

I've never used it and I have no idea how good it is. General voice recognition is extremely difficult. Dragon Naturally Speaking runs on a PC or Mac. Siri and Alexa run on powerful servers. Humans only really understand abot 70% of the words we hear and our brain (usually) subconsciously fills-in the missing parts. And, sometimes we are consciously aware of a delay before we figure-out what was just said.

This is not a job for an Arduino. Get something like a Raspberry Pi. Then you can either use Google speech recognition API which requires an Internet connection or something like SOPARE for offline speech recognition.

It might also be possible with an ESP8266 or ESP32: Audio input and voice recognition on ESP8266 via Google - Everything ESP8266
But don't quote me on that.

I'll mirror DVDdoung here..

many people have used EasyVR for their movie prop replicas

ie; Judge Dredd voice controlled blaster/gun to chnage ammo type..etc..

I think it works pretty decent form the videos I have watched.

That being said.. does 'decent' work for your project? (not sure)

Might just have to test it out..

there is speech and voice recognition..

Is it feasible to teach it to recognize a word (just one word) but spoken by almost anyone?

Not been done yet. You ask a Scottish person how good speech recognition is. I am from the north of England and it is rubbish. If you want to put on a pretend American accent of the right type it might be better. But any word by anyone is just not on at the moment.

I have used EasyVR, and is is rather good (~90% +),
if you train it for your own voice, use it in the same room, and at a fixed position/distance (e.g. 1meter).
The build-in universal word-set is absolute rubbish.
Leo..

I understand getting something like dragon naturally speaking, or alexa would be far too demanding for this system, but those systems are trying to recognize ALL the words. That's a vocabulary size in the tens of thousands. I want it to recognize several (maybe dozens?) of variations of one word. That's a vocabulary size probably smaller than 100. Would that be within the processing power of some of these chips?

Also, apologies to the brits, aussies, kiwis, etc. in the room, but the vast majority of my friends are american, so if it just worked with their american accents that would probably fly for this project. At a bare minimum, there's a single person it really needs to work for, but I can't very well involve them in the building and voice-training process and still expect the surprise/puzzle aspects to work out.

My real doubts come in because I don't think this is a use case people typically design for, so I doubt I'm going to find any examples where better programmers than I have done the legwork and I can adjust it to my own needs. I have not really delved into speech recognition programming, but I am pretty confident its beyond my own skills.

Actually, pretty similar to Alexa. From what I've been told, in their attempts to maintain as much privacy as they could, Alexa does not talk to the cloud until she hears her activation phrase "hey Alexa". It has enough processing power on-board to pick up that phrase, and once activated the rest of it is handled in the cloud. I don't know how alexa compares to an arduino though.

Been looking into some of these suggestions, the raspberry pi with sopare seems like a decent candidate.
Thanks for the replies!

felic:
This is not a job for an Arduino. Get something like a Raspberry Pi. Then you can either use Google speech recognition API which requires an Internet connection or something like SOPARE for offline speech recognition.

It might also be possible with an ESP8266 or ESP32: Audio input and voice recognition on ESP8266 via Google - Everything ESP8266
But don't quote me on that.

There is a new board from Espressif specifically designed for Voice Recognition with a development framework included.

ESP32 LyraT ESP32 NodeMCU LyraT Voice Recognition AI Board (Audio player & Smart Speaker) | eBay

This is their link to Audio Development Framework or ADF GitHub - espressif/esp-adf: Espressif Audio Development Framework

It can interface multiple cloud services Alexa, Google etc as per the uppermost layer.

Also, there is a box in the ADF regarding "keyword recognition". I think you can steer the board to recognize certain keywords using this part of the library.

But i think the ESP32 LyraT board is quite new being released around end of 2018. I believe the EasyVR (EasyVR Shield 3.0 - Voice Recognition Shield for arduino with arduino code | eBay) has been around for quite sometime now.

From what I've been told, in their attempts to maintain as much privacy as they could, Alexa does not talk to the cloud until she hears her activation phrase "hey Alexa".

Someone is not telling you the truth here. There have been many reported cases of targeted advertising appearing from people talking abut things without officially going through Alexa.

Callipygous:
From what I've been told, in their attempts to maintain as much privacy as they could, Alexa does not talk to the cloud until she hears her activation phrase "hey Alexa".

Or it just 'thinks' it's heard the activation phrase.

I did not know that 100% accurate voice recognition was possible.

I did not know that 100% accurate voice recognition was possible.

It is not, not even when people are used as the processing unit.

Grumpy_Mike:
Someone is not telling you the truth here. There have been many reported cases of targeted advertising appearing from people talking abut things without officially going through Alexa.

I'm finding many reports of mistakes one would expect with something as difficult and imperfect as speech recognition. Ie. the device thought it heard its wake word, but humans can tell it was wrong from the recording. I'm not finding reports of targeted advertising based on recordings that shouldn't have been shared.

I understand, and mostly agree with the sentiment that we should watch these companies and devices with a healthy dose of skepticism, but I am wary of the alarmist accusations that are bound to get tossed around with this kind of technology. In such a delicate topic, I need my accusations to be backed by rock solid evidence because otherwise the deck is stacked against these companies even if they make every good faith effort to maintain privacy. So far, I'm not seeing anything that solid.