ESP32 can record audio file(5 sec.) and calculate to FFT for sound recognition?

i want to classify cat and dog sound with esp32.can it work?

MIC : MAX9814 and board : ESP32 - wroom 32

i see interesting website : https://eloquentarduino.github.io/2019/12/word-classification-using-arduino/?fbclid=IwAR1QBzPZt_F8_rFDxerav273ZXndObZB2n0OqGe26JbBe-3VGRSMh_2ziys

===================================

sorry if it's confused because I'm weak in language.Thanks for the advice.