The Arduino Nicla Voice tutorial on custom ML models relies on the Record_and_stream
example to collect training data. However, this example uses the G.722 codec to encode the audio, which limits the frequency range to approximately 50–7000 Hz.
This raises an important consideration: any model trained on G.722-encoded audio will inherently learn patterns based on this frequency-limited representation. When the model is later deployed and used with raw, unencoded audio input during inference on the Nicla Voice chip, there could be a mismatch that impacts classification accuracy—especially for signals with relevant features outside the G.722 range.
My primary goal is to classify audio signals that extend outside the G.722 bandwidth. With this in mind, I’m looking for guidance on the best practices for building models that align better with raw audio input on Nicla Voice.
Which approaches would support access or record raw, uncompressed audio data directly from Nicla Voice (For 16-bit PCM, mono, 16 kHz: 16000 samples/sec × 16 bits/sample = 256000 bits/sec = 32 kB/sec ). Currently I'm considering SPI to stream and record raw audio are there simpler alternatives?
Any insights would be greatly appreciated!