Go Down

Topic: Voice recorder: can't create a clear wave file (Read 376 times) previous topic - next topic

Hillcres-Hellio

Hi,

I am trying to make a voice recorder with my Arduino Uno

Here is the material i use:

- Adafruit SD card reader

- SD card in FAT32

- Adafruit Max4466 mic with adjustable gain

- UNO board

I run the circuit with the 5v pin of the UNO board

I ran the Recording example from the TMRpcm library, and a wave file was created on my SD card but the sound is really awful (i can barely hear myself whistle and there is a really loud noise, impossible to take off with Audacity).

I don't understand why, because my wiring is correct and all my components are new except my sd card which is a couple years old.

I know this is surely a dumb sampling rate problem or something but i can't figure out which one...


Could someone help me?
Thanks!


DVDdoug

#1
Oct 12, 2018, 04:54 pm Last Edit: Oct 12, 2018, 05:14 pm by DVDdoug
I've never used TMRpcm and I didn't realize it has a recording function…

Quote
I know this is surely a dumb sampling rate problem or something but i can't figure out which one...
Is it 8-bit audio?   That's the most foolproof format.

The bytes might be out-of-order if it's 16-bit audio, or if the bit-depth is getting "miscommunicated" the bytes could be scrambled.  8-bit WAV files use unsigned integers, whereas everything else uses signed integers so that's another way things can get fouled-up.   Getting the byte-order wrong (mixing-up the high & low byes) or reading 16-bit data as 8-bits, or vice-versa, can totally foul things up, etc.

(If the sample rate is off, it will simply play at the wrong speed.)

This is a WAV file, right?  Not raw audio data?


You can check the file with MediaInfo
to see if the WAV header shows the format that you expect.     And/or you can "look at" the WAV header  with a Hex editor.

The WAV file header might give you a clue but if the header is valid, there's no easy way to know if the header information doesn't match the actual audio data...    And, there's no easy to check the audio data (other than listening) because those are just bytes and any byte-value could be valid audio data.

Lucario448

I've never used TMRpcm and I didn't realize it has a recording function…
It does, but since it's supposedly an "experimental" feature, is disabled by default.



Is it 8-bit audio?
Yes. I don't think an AVR microcontroller is capable of stereo or 16-bit recordings with its built-in ADC; even the processing power is not enough for a decent sampling rate at this format. But unsigned 8-bit mono, even at 22050 Hz is possible.



This is a WAV file, right?  Not raw audio data?
It should create the standard 44-byte RIFF/WAVE header (no ID3 tags and other metadata); in fact, the library keeps track of how many samples were recorded, in order to update the "RIFF" and "data" values accordingly during finalization.



And, there's no easy to check the audio data (other than listening)
So we would like to hear what's going on, by sharing to us the resulting audio file as it is.
If you are going to attach it to your next post, keep in mind that the file limit is 1 MB; so don't try to upload a long recording. Also wav files aren't directly allowed here, but you can get around this by compressing it into a zip one.

Hillcres-Hellio

DVDoug, Thanks for your response.

Here is what Mediainfo tell me about my file: 16 KHZ, 128 Kbps, 8 bits, Unsigned PCM.

On paper, a really normal file right?

Also i checked on a HEX editor and i can confirm it's a WAV header.

Is there any reason to think it's the SD card that screwed the all thing up?


Hillcres-Hellio

So we would like to hear what's going on, by sharing to us the resulting audio file as it is.
If you are going to attach it to your next post, keep in mind that the file limit is 1 MB; so don't try to upload a long recording. Also wav files aren't directly allowed here, but you can get around this by compressing it into a zip one.
My last piece of experimental music:

DVDdoug

Quote
Here is what Mediainfo tell me about my file: 16 KHZ, 128 Kbps, 8 bits, Unsigned PCM.

On paper, a really normal file right?
Yes, that's good if you're making an 8-bit file.    So it's the data that's messed-up.

How about the file size and the playing time?    Since it's 8-bit audio at 16kHz there should be 16k bytes per second.   And, does the playing-time match the recording time?

Quote
Is there any reason to think it's the SD card that screwed the all thing up?
No, I don't think so.   You can read/write the header so you should be able to read/write beyond the header.

Somewhere, I read about write-speed limits with SD cards but I don't think that's true.  I've got a video recorder that records to an SD card.  



Quote
Yes. I don't think an AVR microcontroller is capable of stereo or 16-bit recordings with its built-in ADC
The ADC is 10-bits, so you could  write it into a 16-bit word, and then bit-shift to get full-volume.

Stereo is possible,  but since there is one-shared ADC you can't sample left & right a exactly  the same time.   And, the ADC speed-limitation is also "shared" so you'd have to cut the sample rate in half.

Quote
even the processing power is not enough for a decent sampling rate at this format.
I don't think "processing" is the limitation/bottleneck.  



P.S.
You know...   In theory recording is "easy", especially at 8-bits...    You just read the data at a known sample rate and write it to a file.  (With a 10-bit ADC you have to right-shift two bits before throwing-away the high-byte of a 16-bit integer.)    

The only tricky thing is writing the header, but that's not too bad if you're always writing the same format.     And, you do have to go-back and write the data-chunk size when you stop recording.

...Playback is not so easy when you don't have a DAC. ;)

MarkT

Stereo is possible,  but since there is one-shared ADC you can't sample left & right a exactly  the same time.
At typical hifi sample rates this is unimportant as the time discrepancy corresponds to a few mm difference,
most I2S DACs and ADCs sample L and R interleaved I think.
[ I will NOT respond to personal messages, I WILL delete them, use the forum please ]

DVDdoug

Quote
At typical hifi sample rates this is unimportant as the time discrepancy corresponds to a few mm difference,
most I2S DACs and ADCs sample L and R interleaved I think.
True, it's not a BIG deal but normally  the 2 (or more) channels are sampled on the same clock edge.  And, it's more like a couple cm's at "Arduino sample clock rates".

The data on a WAV file interleaved and it may be sent serially over I2S (or USB, S/PDIF, HDMI, or whatever) but the channels are re-synchronized and clocked-out together by the DAC.

Lucario448

My last piece of experimental music:
Nice whistlings ha ha :D

About the recording, the audio actually sounds like it is supposed to (as the header and encoding is concerned); the annoyance comes from a loud pulsing noise that mixes with the microphone's signal.
That pulsing noise is the SCK line of the SPI port; mostly when it has to update data blocks (otherwise a 4 or 8 MHz clock frequency should not be audible for any non-ultrasonic equipment or human being).

The solution sounds bizarre: you have to kind of isolate the ground line for the microphone's output (or the SD card). I said "kind of" because is not literally cutting off the ground connections.
I'll let someone else give you more details about this solution, since for me is hard to explain (being honest, I've never applied that on a similar problem I'm having, because I actually don't understand how to either)



Somewhere, I read about write-speed limits with SD cards but I don't think that's true.  I've got a video recorder that records to an SD card.
The limit comes mostly from the microcontroller's available RAM and the SPI clock speed. Interacting with the native SDIO protocol rather than ("legacy") SPI, is less than an issue.

A test I've made long ago, I noticed that an ATmega328P at 16 MHz can write at approx. 53 KB/s; being a for loop the only known significant overhead. Maybe I could push 1 or 2 KB/s more if I disable the timer0 overflow interrupt (that makes delay(), millis() and micros() possible).
You may judge if less than 53 KB/s is enough for the application.


The ADC is 10-bits, so you could  write it into a 16-bit word, and then bit-shift to get full-volume.
I know it's the only way to record the whole resolution, but is not true 16-bit audio nonetheless.


Stereo is possible,  but since there is one-shared ADC you can't sample left & right a exactly  the same time.   And, the ADC speed-limitation is also "shared" so you'd have to cut the sample rate in half.
Aside from the sampling rate, the shifted timing of both channels is for me the deal-breaker in stereo capabilities.


I don't think "processing" is the limitation/bottleneck.
However:
8-bit WAV files use unsigned integers, whereas everything else uses signed integers
Remember that the ADC also spits out unsigned values, and converting to a two-complement (signed integer) encoding (by substraction) takes up several CPU clock cycles; even more when it's 8-bits "wide".
Endianness is not a problem; the RIFF/WAVE standard always expects little-endian (LSB first), while the Arduino also encodes the variables (int and larger) this way.


P.S.
You know...   In theory recording is "easy", especially at 8-bits...    You just read the data at a known sample rate and write it to a file.  (With a 10-bit ADC you have to right-shift two bits before throwing-away the high-byte of a 16-bit integer.)    

The only tricky thing is writing the header, but that's not too bad if you're always writing the same format.     And, you do have to go-back and write the data-chunk size when you stop recording.
This is what the library does. The only aspect you missed is the double buffering necessary to keep sampling even when writting to the card.

TMRpcm does even more complicated stuff to keep the process out of the main code: it uses the input capture register (ICR) to define the sampling rate (triggers the sampler ISR), and one of the output compare registers (OCR) to trigger the ISR that writes to the file if one of the buffers is full.
In order to avoid delays in the sampling, this ISR re-enables only the overflow interrupt (triggered by the ICR); hoping that the full buffer gets completely written before the other one gets full as well. This is why is so important to not choose a very high sample rate.

A similar thing does when playing, although buffer overruns are more unlikely since reading is faster than writting. 44100 Hz playback is still not possible, but this time you can blame the external oscillator; at 16 MHz clock and 8-bit PWM resolution, the maximum frequency it can achieve is 62.5 KHz (fast PWM). For 44 KHz playback, you need at least close to 100 KHz of PWM; and so a faster microcontroller or a real DAC (a resistor ladder on pins 0 to 7, aka port D, should work too).



Yeah, not an intuitive explaination for beginners; but... this is the "magic" behind the library nonetheless.




...Playback is not so easy when you don't have a DAC. ;)
From a certain sampling rate it's true, otherwise PWM is a feasible alternative.



And, it's more like a couple cm's at "Arduino sample clock rates".
Maybe the "drag" will become larger if you consider how the ADC in a AVR works. You know, successive approximation takes a while.

Hillcres-Hellio

Hey Lucario, not very intuitive for me but thanks anyway ahah. What i've been asking myself is why does it happen with my specific circuit and components, and not with the other's?

What are you meaning by isolating the ground pins?

MarkT

#10
Oct 14, 2018, 03:20 pm Last Edit: Oct 14, 2018, 03:22 pm by MarkT
My last piece of experimental music:
Sounds like buffer overruns to me, so regular discontinuities - otherwise the audio is clear (for 8 bit that is).

Anyway I popped the file into matplotlib in Python to see what the values looked like and its interesting:



The samples are regularly abruptly forced to 00, then spend a while wandering in the range FE-FF-00-01
or so, before smoothly slewing back to the actual audio data.

The wandering about 00-FF suggests a small signed value, yet the main signal is clearly an unsigned
sample waveform, yet the abrupt changes suggest simple buffering issues...

Perhaps there's some interaction with the analog input pin from something?

Or the adjustable gain is doing something odd (and has a DC offset)?
[ I will NOT respond to personal messages, I WILL delete them, use the forum please ]

Hillcres-Hellio

The wandering about 00-FF suggests a small signed value, yet the main signal is clearly an unsigned
sample waveform, yet the abrupt changes suggest simple buffering issues...
Are you talking about the buffer of the Atmega 328p?

Perhaps there's some interaction with the analog input pin from something?
I'm using only one analog pin. I tried to change, the 5 Analog pins are giving me this crapy music


Or the adjustable gain is doing something odd (and has a DC offset)?
I don't think so because with an other mic (KY-something) the result is the same.......

MarkT

Are you talking about the buffer of the Atmega 328p?
ATmega 328p doesn't have a buffer.  There's a software buffer in the SD library for writing 512 byte
blocks to the filesystem, and if you look at that graph its obvious the periodicity is 512 samples.

Quote
I'm using only one analog pin. I tried to change, the 5 Analog pins are giving me this crapy music
Doesn't matter which pin, something may be interacting with the signal that goes to the pin...
[ actually I've figured it out with a little more thinking ]
Quote
I don't think so because with an other mic (KY-something) the result is the same.......
My theory is that when the SD library you are using writes to the SDcard the power supplies are
drooping, causing the signal to go haywire for a bit.

My suspicion is that you are using the 3.3V pin on the Uno to power the SDcard - you may need a
separate 3.3V regulator to supply it, rated at 0.2A or more, to ensure the SDcard gets the power
it needs for writes and erases.

Even if that's not the problem, you may have a ground-loop in your microphone signal handling
(before amplification) which causes the droop on the power supply to couple to the small signal
from the microphone and saturate the preamp.

You need to post your circuit, preferable as a schematic and as a photo, so its possible to figure
out if this sort of thing is happening.

You could try adding some electrolytic decoupling to the 3.3V supply to the SDcard reader.
[ I will NOT respond to personal messages, I WILL delete them, use the forum please ]

Lucario448

why does it happen with my specific circuit and components, and not with the other's?
For example?

Even for playback that noise mixes out, but not that loud for some reason.


What are you meaning by isolating the ground pins?
As I said, it involves some filtering that frankly I don't understand how it works. But such technique is applied in high-quality PC sound cards, where no switching noise is "leaked" out at all.

I judge the ground line as more or less the culprit; since the outputs are usually isolated by the DAC chip, but the ground is connecting everything in the circuit directly.



The wandering about 00-FF suggests a small signed value, yet the main signal is clearly an unsigned
sample waveform, yet the abrupt changes suggest simple buffering issues...
By trying to import the audio file as "raw data" in Audacity (and so ignoring any headers), I've realized that indeed is encoded as unsigned 8-bit mono at 16 KHz.

Up to my knowledge of this library, if an overrun ever occured, the recorded waves would become chopped up and not swinging between the extreme values. The ISRs are so time-critical that they don't have room for validations; everything has to be assumed because any processing eats up precious CPU time.


If it is a buffer overrun, try lowering the sampling rate (to 8000 Hz) and/or increasing the buffer sizes (up to 254). Keep in mind that increasing the buffer sizes to the maximum possible will make the sketch to take up to almost 3/4 of the available RAM; wouldn't be a problem only if the Arduino is 100% devoted for audio recording/playback (with no extra libraries or modules to interact with, e.g. displays).

If the problem still persists, then definitely is a signal noise (or what MarkT said).
It's somehow mixing with the microphone's output, and so picked up by the amplifier; thus making it clip in the recording.

Hillcres-Hellio

Hi Mark, thanks a lot for your response.

Here is a schematic of my circuit:

By electrolytic decoupling, you mean a capacitor? If yes, of how many uF? (actually my circuit runs with 5v)

Go Up