Do you know how to code some speech recognition code for a computer?
If so you can try to port it to Arduino but I dont think you have enought ram or processing power for that.
that is kinda what i was saying:
Sound of lengthX, then ~0.3_second pause, thenSound of lengthX, then 3_second pause = trigger1
Sound of lengthX, then 3_second pause = trigger2
That should be possible but could result in some false positives. Also, you could implement pitch detection with an FFT and then sing to it (if you can reproduce a note)