Audio input filter

I'm looking for a piece of software to filter audio for interpretation by voice recognition software.

Basically I will have background audio generated by the same system receiving the audio input. Audio output will come out near the mic. I need a piece of software that will analyze the incoming audio and remove audio output leaving the voice command.

I've done a fair amount of research and come up empty handed. A lot of the problems people point out aren't really a concern here. There's no real concern for latency or anything like that because the voice recognition response can take as long as it needs to. I do see some potential for issues with the input and output sync, hopefully that is correctable.

Anyways, does anyone have any suggestions on how I might get this done?

does anyone have any suggestions on how I might get this done?

well sorry not me because I don’t think you can do this project with an Arduino.

This problem is a lot harder that you might think. Some serious DSP grunt is needed and the results will
only ever be partially successful. The microphone's "background audio" is a severely altered version of
what you output to the speaker(s), with the alterations changing in real time and need to be modelled
across the spectrum in order to cancel much.

Get a good noise-cancelling microphone instead. Noise-canceling microphone - Wikipedia

Ah, you're right I didn't take that into consideration. If the environment were static would you potentially be able to "tune" the noise cancelling to account for the audio alteration?

Yeah I've looked at some noise cancelling mics but they are usually pricey and my goal is to blanket my whole house with mics so you can control my system from anywhere. I'm afraid that many mics would likely be out of my reach and that's not getting into most of them if not all being usb, I wouldn't have anything close enough to plug them into. I guess I'm not sure how they would work with something like a usb to ethernet adapter.

Thanks for the info, that helps me think about it in the right direction at least.

Yeah I've looked at some noise cancelling mics but they are usually pricey and my goal is to blanket my whole house with mics so you can control my system from anywhere.

Usually "noise cancelling" mics work by direction & proximity... A headset mic with your lips right-up to the mic. It won't help in your application. I believe they were invented for airplane pilots. Or, there may be some electroinic noise-canceling mics that work like a [u]noise gate[/u] (also not useful in your application).

Directional mics (cardiod, super cardiod, and shotgun) can give you a better signal-to-noise ratio but it's only going to help you under certain marginal conditions. And, they typically start at around $100 USD.

Bronston:
Ah, you're right I didn't take that into consideration. If the environment were static would you potentially be able to "tune" the noise cancelling to account for the audio alteration?

Yeah I've looked at some noise cancelling mics but they are usually pricey and my goal is to blanket my whole house with mics so you can control my system from anywhere. I'm afraid that many mics would likely be out of my reach and that's not getting into most of them if not all being usb, I wouldn't have anything close enough to plug them into. I guess I'm not sure how they would work with something like a usb to ethernet adapter.

Thanks for the info, that helps me think about it in the right direction at least.

If you have arrays of microphones then there are sophisticated ways to do this well, but sophisticated
means complex and very compute intensive, basically beam-forming. Adaptive beamformer - Wikipedia

This is non-trivial tech.

Hmm I'll have to study up on that. I'm certainly not opposed to some complex tech.

My rough plan for the time being is to use some electret mics to LM386's in to the analog pin on an wifi connected arduino, datastream straight off the pin to a socket on my node.js server on a big box(16 xeon cores and 96 gb ram) assemble into .wav, and then the tough part. All audio output (whole home audio system) is routed through that box. It will be recording all the time both output and the arduinos. Sox will invert the waveform combine the tracks and maybe with a whole lot of fiddling and probably luck it will remove enough of the output for sphinx to understand my voice.

Oh if anyone has some hints on turning a stream of analog samples into a wav file that would save me some research.

At least writing to a WAV file is pretty simple. Just write 44 zeros, then start writing the audio samples. When you stop, seek back to the beginning of the file and write the WAV header where you originally put those zerso. You'll need to know how many samples you wrote, and details like the sample rate to produce a valid header.

WAV is fundamentally a file storage format, where you know the total size of the data. It's pretty much meaningless in the context of continuous streaming.

There is an ESP8266 baby monitor project that streams audio in real-time.

https://perso.aquilenet.fr/~sven337/english/2016/07/14/DIY-wifi-baby-monitor.html

Nowadays it might make more sense to use ESP32 and i2s MEMS microphone. But I have no experience with i2s microphones. (ESP8266 || ESP32) + i2s microphone = WiFi bug