I'd like to be able to make a UFO drop down from a box when someone sings, or whistles a tune, specifically that one from Close Encounters of the 3rd Kind.
Now, I could have a xylophone nearby to more accurately make the tune... but I'd prefer human made sounds.
Your idea makes me smile, but man... That's tricky stuff!!! Usually it's done on powerful servers on the Internet, You only need to recognize one song which makes it easier but I'm not sure the Arduino can do it.
First off, you need a microphone & preamp with a biased output (because the Arduino can't read the negative-half of a normal, non-biased, waveform). Or, you can get a microphone board with the microphone and all of that built-in. That' the easy part...
Then, you'd need FFT or FHT to analyze the frequency. This is "difficult" because real-world sounds contain harmonics. They are not one-single frequency. And, the Arduino can't sample and calculate FFT continuously at the same time so there may be gaps and you might miss half the notes.
Most singers can't get the notes on-pitch without a starting-reference (very few people have "perfect pitch")and non-singers could be completely out-of-tune and off-pitch. The xylophone would be repeatable and on-pitch.
The levels (loudness) will vary with voice or the xylophone and that makes frequency detection more difficult.
The timing/tempo won't be precise either. You're looking for the right note at the right time, so you'd need some "smarts" in your software to find the tempo being played/sung and adjust/correct to match the tempo of the original song...