Help with sound interactivity

I'm still new to Arduino and this forum. I have been building my full scale R2D2 for almost a year now and have incorporated 3 Arduino's into it. One controls random functions for lighting and HPS or projector movements. The second controls remote functions like utility arms and so on. The 3rd controls interactive things. He can track moment threw the room, IE look in the direction he saw motion. ( any motion at this time, still working on that ).

At any rate, I'm working on getting him to talk to me when he hears sound. The idea is to give the elusion he is talking to or with you. I'm using a " Detection Sensor Module Sound Sensor " that can be seen at: " http://www.ebay.com/itm/New-1pc-Sound-Detection-Sensor-Module-Sound-Sensor-Intelligent-Vehicle-F-Arduino-/130814345964?pt=LH_DefaultDomain_0&hash=item1e752482ec " . So far it works great. To good to be precise. R2 talks to the TV, to the dish's being clanked in the other room, even to the cat. He even talks to him self. He hearse his own head spin and or a relay snap closed or one of his other various sounds he makes and talks happily to him self. I have adjusted the sound detector so that it hears me talk to him, but he still hears all kinds of things. And is happy to talk to them all. Any ideas on how to help him desifer the noises he is hearing?

That sensor is detecting sound, and has only a digital output. That is very simple.

I'm thinking in the direction of this, EasyVR 3 Plus Shield for Arduino - COM-15453 - SparkFun Electronics

You could perhaps use a mic with amplifier and do FFT on the Arduino. With the result you might be able to recognize someone talking.

I will look into the FFT. I was thinking last night while at work and I remembered how the led flashed on the sound sensor as it detected noise. This got me thinking. When we talk, we talk in sentences and words. Sentences brake down into words and words breakdown into syllables. I wonder if I only let R2 say something when the sound sensor has counted 10 or more individual sounds. Almost like counting syllables. I was also thinking of limiting r2 from saying something if ( x = ??) so many seconds have not passed from the last time he spoke.

I have a voice recognition board fro the Arduino that I plan on installing later. But I want to use that for specific commands where as the sound detector will give the allure that R2 is chatting with someone. When R2 is in chat mode, he dose not have to understand what is being said, but he dose need to talk back to sound.

Any one have any other ideas on how we can differentiate sounds from voices?