Understanding Audio Data Collection and Model Training for Nicla Voice

The Arduino Nicla Voice tutorial on custom ML models relies on the Record_and_stream example to collect training data. However, this example uses the G.722 codec to encode the audio, which limits the frequency range to approximately 50–7000 Hz.

This raises an important consideration: any model trained on G.722-encoded audio will inherently learn patterns based on this frequency-limited representation. When the model is later deployed and used with raw, unencoded audio input during inference on the Nicla Voice chip, there could be a mismatch that impacts classification accuracy—especially for signals with relevant features outside the G.722 range.

My primary goal is to classify audio signals that extend outside the G.722 bandwidth. With this in mind, I’m looking for guidance on the best practices for building models that align better with raw audio input on Nicla Voice.

Which approaches would support access or record raw, uncompressed audio data directly from Nicla Voice (For 16-bit PCM, mono, 16 kHz: 16000 samples/sec × 16 bits/sample = 256000 bits/sec = 32 kB/sec ). Currently I'm considering SPI to stream and record raw audio are there simpler alternatives?

Any insights would be greatly appreciated!

The 16 kHz sample rate limits the "raw" bandwidth to 8 kHz. Do you have a good reason to believe that this is significantly different than 7 kHz?

There have been very few posts on this forum regarding the Nicla Voice, and none I've seen reporting success with voice recognition, so you may be on your own.

Yes, I need faint signal detection outside the G.722 range. Will do the SPI then, thanks.