Optimized FFT and BLAS on Arduino Giga?

Hello,

I'd like to use the Giga R1 for a real-time, low-latency sound processing project.
I need to perform FFT and IFFT on 1024 points 180 times per second, and some more processing around it, plus the sound capture and output (ADC and DAC at 44.1kHz).

I've just discovered the Giga, and its organization seems nice. The M4 core could be used for I/O and the M7 for computations.

But before starting development, I'd like to be sure that optimized libraries exist (at least for the FFT, which is not easy to get right, especially given that I'm new to the Cortex M family, with its Thumb ISA, DSP statements, and particular SIMD).

So my question is : do FFT and BLAS libraries exist for the M7 in the Arduino tools (I'm mostly using the command line tools). I saw that another ecosystem exists (CMSIS) that provides such optimized libraries (CMSIS DSP Software Library) but it's not clear to me if they can be used from arduino-cli.

I've also seen an older post here (Really...Really Fast) discussing a high-performance FFT implementation, but it's not clear what library is used.

Any comment would be very helpful.
Thank you in advance,
D.