Recognizing pre-recorded sound clips from an audio stream.

Hello,

My project is to control a Nintendo 3DS hand held video game system from my Arduino Mega.
I have all the basic controls wired up and working well.

I need to be able to recognize the state of a video game being played by it's known audio cues.

Specifically, the Super Smash Bros fighting game. At the end of the match there will be a winner. The winner is announced with a known sound clip.

An example is: When Mario wins, there will be a known sound clip that plays with the announcer saying "The winner is Mario!".

To my advantage the "match winner" sound clip that plays are all known and will be EXACTLY the same each time it is played.

I need to recognize this specific sound clip and then have the Arduino press the "Start" button on the Nintendo 3DS to move the game to the next screen.

I have done numerous searches on FFT, FHT, Matched Filters, Speech Recognition, etc and have not come up with a solid way to do this.

I don't think speech recognition techniques work well since the audio is not a regular human voice, but a video game announcer "cartoon" voice with lots of background sound affects.

One theoretical approach I have is to somehow "train" the Arduino with pre-recorded sound clips and have it react when it hears them. Similar to the EasyVR Shield 3.0.

Any thoughts on how to do this?

Video of the sound clips: Smash Bros 3DS: All Victory Pose Animations (+ Koopalings & Character Alts! - English) - YouTube

I am a software engineer and I am very comfortable around coding! Any help is appreciated.

I support bitcoin tipping for helpful posts!

You could train a neural network or use kohonen mapping.

With FFT you should be able to make a spectrogram and do pattern comparison.

Probably follow the loudest frequency over time will be sufficient. In XML notation you get something like

<sample, start time, duration, frequency, loudness>
...
<sample, start time, duration, frequency, loudness>

A frequency of zero might be used for pauses.

Probably an Arduino is not powerful enough, maybe RPI for the processing

Bump!

Bump!

You have unrealistic expectations. Why else do you suppose that no one has told you how to do what you want?

Thank you!