I thought detecting sound direction would be easier than what different projects imply. So far it seems I was right. People have used multiple microphone arrays, people have measured sound pressure and calibrated their sound detectors and whatnot.
All I wanted to do was using two microphones and detect the time shift in their signals. And so far it has worked perfectly. I sample at 16 kHz. The distance between the microphones is 42 mm. This means the time shift will be from -2 to 2 samples. It's like distinguishing 5 sectors of a 180° field. Raising the sample frequency or moving the mics more apart will improve the resolution.
If raising the sampling frequency is not an option, something tells me that I can as well "raise" the frequency by interpolating. It might work, because I record some 256 samples per channel at a time. Matching the two channels and thus finding the time shift might become more precise, if I can shift not only amounts of original samples, but fractions of samples.
Using only two microphones won't distinguish sounds from the front from sounds from the back. But the application will be a robot turning towards the sound source. And in that case it works. It hears the difference between left and right. It might hear the sound coming from 45° left, when it's actually 135° to the left. It starts turning to the left, while possibly detecting that the sound first "escapes" from 45° to 90°, before it again comes back to 45° and eventually to 0° or straight ahead.
Your two microphones are two points in space (A and B) meaning they are "aligned" and in a plane (as you only have 2 ➜ grey plane). if you take the perpendicular bisector of the line segment connecting the two microphones in that plane (the blue line), it is equidistant from both microphones and so any sound source located on that line will be received at the same time by the 2 microphones.
if you go to 3D, you take a plane normal to plane containing the 2 microphones and passing by the equidistant line and all points in that (red) plane are also equidistant to the microphones
by turning a bit you can indeed make decisions, your code needs to take that into account as if you got it wrong and move the wrong way and you course correct, you'll find the situation again where the ∆ is 0 and this time you need to go the other way ➜ some coding and "memory" of the situation is needed
I need to at least increase the distance between the microphones. I need a better resolution. But I count on that in most cases the robot will first detect the sound either to the left or the right. If it happens that the first measurement tells that the sound source is straight ahead, it could as well be straight behind. In that case the robot has to turn to either side to verify it.
Humans and other animals need only two ears. Dogs wiggle with their head, when they listen carefully to strange sounds. If I'd go 3D with this, I'd place the two mics on a device that could wiggle. Or add a third or fourth mic.
This question comes up a lot on the forum, so people would be interested if you took the time to post the code and described the hardware in enough detail that a basic example can be reproduced.
I'll do that. So far I do everything on an Infineon CY8CPROTO with integrated mics. And on Eclipse IDE. But I also have a mic module, which might be compatible with some of my Arduino boards. And since Arduino IDE is superior to all other IDEs, I might continue my testings on an Arduino.
And yet, somehow, with only 2 'microphones' (ears) most humans and other animals can determine the direction of the origin of a sound in any direction.
actually we have in the inner ear (cochlea) about four thousand very sensitive sound receptors and in your brain you have a dedicated neuronal network with a learning model that has been trained for years to deal with the 8000 inputs
It's true that front-back localisation is generally not as precise due to the limited cues available, and the brain relies more on the combination time differences and spectral cues to make its best estimate.
But they definitely are in a different positions along the cochlea, they also track different frequencies and the two ears are not exactly the same.
Also the shape of the outer ear gives the brain additional Spectral Cues to help localizing sounds coming from above, below , in front of, or behind the listener
There is lots of research on auditory perception and sound localization if you are interested, read about
Whatever sound is the loudest will dictate what determines the outcome. Say I have a sound source to the keft and another to the right. I calculate the least sum of squared differencies from all 5 sectors. Sectors 2 and 5 have small sums, all others have larger sums. I just go with the least sum.
If I'd be interested in detecting f.i. a whistle, I'd run the sampled sounds through narrow band filters. If I'd want the system to detect claps in a noisy environment, I'd need to include a threshold for sound pressure, because a clap is very short and a softer sound would dominate the least sum thing by lasting longer.
I'm not sure. To me, phase shift sounds like shifting a sine wave by pi or half wave length. What I have is sound waves, which look arbitrary, no regular wave forms. I just shift them one sample at a time, not half wave lengths.
That's an image of a sound coming from the left to my computer, recorded with Audacity. It hits the two microphones of my laptop. I just try to match the two waves to see how much they are shifted from each other.
That is irrelevant. A microphone tracks from 20 Hz to 20000 Hz. If the cochlea as a construction would be superior to a microphone, we would build microphones like that. Having different microphones specialised at different frquency bands.
To the brain, a high frequency is a high frequency and a low one is a low one. You can hear them from straight ahead, from the top, from the left or right. The ear or the cochlea doesn't distinguish directions. The two cochleas really don't tell the brain anything more than two microphones tell your recording system. But to really get microphones to record a sound as equal to the human sound sensation, binaural microphones were invented, head shaped things with ears and the mics inside the ears. To make a robot distinguish sound directions other than the 1D, 180 degree thing I have, I guess I would use such a binaural microphone. And a lot of machine learning. And a little wiggling of the head.
By irrelevant I only meant that the structure of the cochlea is meaningless, if I want to build a model that distinguishes more sound directions. Instead I need to build ears that affect the frequency spectrum depending on from which direction the sound comes. I have to analyse the sound signal from the microphone. Of course I can divide the sound into a lot of bands, through narrow band filters, and analyse each band. And that might mimic what the cochlea together with the brain does. But I believe that won't give a straight answer without a lot of machine learning first.
So I stick to a very basic functionality, which is detecting the direction in a 180° sector in front of the robot. And if the robot tries to turn towards the sound and the sound appears to first turn further away for a while, before it starts turning more towards straight ahead, it just means the sound source was slightly behind at first.
Not totally - that contributes to "perform the FFT" to go to the frequency domain and feed that information from the specialized hair cells to your brain.
With a microphone you sample and perform the FFT in two steps, here it's like you had lots of sensors tuned for a given frequency band.
Ok, I kind of get it. One mic and one AD can produce say a 44.1 kHz 16 bit signal. Having several narrow band mics, each could produce a 44.1 kHz 16 bit signal. Surely that would be much more information. Dividing one single full spectrum 44.1 kHz 16 bit signal into several bands (each still being 44.1 kHz 16 bit) can never lead to the same amount of information. Of course, 44.1 kHz is overkill, if we are interested in say a 100 Hz – 200 Hz band.
Several narrow band mics should instead mean that each band could somehow optimize the usage of resolution. A 100 Hz – 200 Hz band doesn't need a high sampling frequency, maybe a bigger bit depth. The 10000 Hz – 20000 Hz band again needs higher sampling frequency.
yes, the ear filters from 200Hz to 20KHz. if you want to get something similar you would need to indeed sample at least at the double of the top frequency (Nyquist theorem) so 44.1 kHz would be fine.
Estimates can range from around 4,000 to 16,000 hair cells (the exact number may vary between individuals) in the human cochlea. These hair cells are sensitive to different frequencies and are connected to the auditory nerve, which transmits electrical signals to the brain for further processing and interpretation.
Say you have 10,000 sensors so you want 10,000 bins for your FFT and so need to collect at least 10,000 samples before processing so roughly a quarter of a second. Then you need to perform the FTT on those 10,000 samples which will add a few tens of milliseconds on a fast arduino and then run the AI model
so at best you'll get ~3 Hz output for an indication of the direction of the sound.