Feasibility of Arduino project with 2 separate microphone signals

Hi all,

I'm working on a project that records two people speaking at the same time and then runs an algorithm to separate the two voices and I was looking for some insight if this is feasible on an Arduino?

The algorithm requires the two recordings to be separate, so I need one stereo mic input (or two separate mono inputs) and thus I would need to connect two microphones to the Arduino. I have some 3.5mm microphones that I am planning to use and I found a 3.5mm TRRS jack (https://www.sparkfun.com/products/11570). Can I connect my microphones to different analog pins on the Arduino to input the audio signals? I've seen some forum posts about designing an analog circuit to connect electret microphones to the Arduino, is it possible to connect two electret microphones to the analog pins? My preference is to use 3.5mm microphones, but I'm open to using electret microphones.

I have also looked at some audio shields, but most of the ones that I have found so far only have one microphone input port so they won't work for my project. I did find a shield by PlainDSP (http://www.plaindsp.com/product/audio-kit/) that might work for me, I'm waiting for a response from the developer for them to clarify.

I also plan to connect an LCD screen to display the separated voices after they have been reconstructed by the algorithm, but my focus is on the microphone inputs because I expect it to be more difficult. I am also aware that the memory on the Arduino is fairly small and that I will likely need to attach an SD card to save the recordings.

I appreciate any advice you can offer based on your past knowledge. Thanks in advance.


The algorithm requires the two recordings to be separate,

Hmm. So why do you need to separate them if you have separate recordings?


You may be able to do it on a Teensy because the Teensy audio subsystem can process stereo. Look closely at what size the variables are in the algorithm. If you're using 32-bit integers, for example, then each time you load an integer into memory, an 8-bit Arduino requires 4 instructions to do that. So the 16MHz Arduino is effectively running at 4MHz. Can your algorithm work at that speed? (The top-end Teensy is 32-bit and 180MHz so it's 40 times faster for 32-bit operations.)

By separate I meant that the two recordings have to be different, the algorithm doesn't work if the inputs are the same recording. Each recording contains a mixture of the two speakers, the microphones record both speakers (the people are speaking at the same time so each microphone creates a recording that contains both voices). The algorithm requires recordings from two microphones so that they can be compared and then the original two voices are reconstructed.

So you're working from recordings and not live microphones? Then the speed is less important. You probably don't care if it takes 10 seconds to process 1 second of audio.

This sounds like a pretty heavy DSP problem. A dedicated DSP processor might be better. Or a faster general-purpose computer like a Rasberry Pi or a Windows laptop. The reason for using an Arduino is for easy interfacing to simple devices like buttons and beepers, not microphones and megabytes of data.


OK, there's a lot of complexity to this. I'm an old Broadcast/Recording engineer and there seem to be a lot of questions about this.

What is the final objective of this?

What is the desired "output" of this algorithm?

Is it two audio recordings which closely resemble the two original speakers, with little crosstalk between the two speakers?

Is it two sets of text that accurately represent the content of what the two speakers 'said'?

How different or similar are the two voices? There are very many significant variations in human voices. Should the algorithm/system be able to separate two very similar speakers?

What microphones would be used for the recording process?

Are the two speakers actually speaking at the same real time? In the same or different locations?

Wow.. this "sounds" difficult!

Sounds like trying to do a digital version of the old analog noise cancelling microphone.

One microphone1 records subject1 plus noise (subject2) hopefuly at lower volume

One microphone2 records subject2 plus noise (subject1) hopefully at lower volume.

You feed microphone1 with an attenuated microphone2 to just cancel the microphone2 noise, the result is subject1 at a lower volume.
You feed microphone2 with an attenuated microphone1 to just cancel the microphone1 noise, the result is subject2 at a lower volume.

Tom… :o

Hi Morgan, I want to record using the microphones, but running the algorithm on the recordings should be fine. I'm not too concerned about the speed at this point, I'd like to get it working and then potentially make it faster on a future design implementation. But yes, this project is pretty DSP heavy. Thanks for your other device suggestions, I'll take a look at them.

Hi Terry, the desired output of the algorithm is a recognizable recording of each person's voice. Once the voices are reconstructed, I'm running them through a voice-to-text API to display the two voices on a screen. The two audio recordings are as Tom suggested, a mixture of the two speakers because they are both speaking at the same time in the same environment. The two voices I plan on using for testing are a woman's and a man's, but I'd prefer not to have to limit the device because of the voices. The microphones that I have now are lav mics made by Aputure (https://www.aputure.com/products/a-lav-ez-1).

Hi Tom, yup that's a good, brief description of my project. I'm working on Blind Source Separation of two voices.