my next step is to perform a fast fourier transform on that data transforming it from a time(finite) to a frequency domain
Just a point, this will not be sufficient for you to perform a voice recognition system even if it only for one word. You need sliding window FFTs and parameter extraction and then template matching software. All in all too much for this tiny but plucky processor.