Ok so that's not so bad. Don't overcomplicate it more than it needs to be.
Mouth can just move randomly whenever one of the responses is being played. No-one says it has to realistically mimic human speech patterns, which would be very tricky.
You will have to identify the direction of the person speaking. You could then simply:
- move eyes first in the direction of the speaker
- rotate head in the direction of speaker
- once head has reached end of travel, straighten eyes again
That would meet the requirements.
It sounds like the speech recognition will be the most sophisticated part. The pre-recorded responses would not be hard and could be done in one of the Arduinos that can play audio from an SD card, as I suggested in your other thread. No need for external MP3 players or suchlike.