Detecting direction of sound

Johan_Ha · October 22, 2023, 6:56am

I thought detecting sound direction would be easier than what different projects imply. So far it seems I was right. People have used multiple microphone arrays, people have measured sound pressure and calibrated their sound detectors and whatnot.
All I wanted to do was using two microphones and detect the time shift in their signals. And so far it has worked perfectly. I sample at 16 kHz. The distance between the microphones is 42 mm. This means the time shift will be from -2 to 2 samples. It's like distinguishing 5 sectors of a 180° field. Raising the sample frequency or moving the mics more apart will improve the resolution.
If raising the sampling frequency is not an option, something tells me that I can as well "raise" the frequency by interpolating. It might work, because I record some 256 samples per channel at a time. Matching the two channels and thus finding the time shift might become more precise, if I can shift not only amounts of original samples, but fractions of samples.

Using only two microphones won't distinguish sounds from the front from sounds from the back. But the application will be a robot turning towards the sound source. And in that case it works. It hears the difference between left and right. It might hear the sound coming from 45° left, when it's actually 135° to the left. It starts turning to the left, while possibly detecting that the sound first "escapes" from 45° to 90°, before it again comes back to 45° and eventually to 0° or straight ahead.

J-M-L · October 22, 2023, 7:47am

not only thtat

Your two microphones are two points in space (A and B) meaning they are "aligned" and in a plane (as you only have 2 ➜ grey plane). if you take the perpendicular bisector of the line segment connecting the two microphones in that plane (the blue line), it is equidistant from both microphones and so any sound source located on that line will be received at the same time by the 2 microphones.

if you go to 3D, you take a plane normal to plane containing the 2 microphones and passing by the equidistant line and all points in that (red) plane are also equidistant to the microphones

by turning a bit you can indeed make decisions, your code needs to take that into account as if you got it wrong and move the wrong way and you course correct, you'll find the situation again where the ∆ is 0 and this time you need to go the other way ➜ some coding and "memory" of the situation is needed

Johan_Ha · October 22, 2023, 8:44am

I need to at least increase the distance between the microphones. I need a better resolution. But I count on that in most cases the robot will first detect the sound either to the left or the right. If it happens that the first measurement tells that the sound source is straight ahead, it could as well be straight behind. In that case the robot has to turn to either side to verify it.
Humans and other animals need only two ears. Dogs wiggle with their head, when they listen carefully to strange sounds. If I'd go 3D with this, I'd place the two mics on a device that could wiggle. Or add a third or fourth mic.

TomGeorge · October 22, 2023, 10:45am

Hi,
Will you be using a single frequency for direction finding, or just detecting a sound, made up of a combinations of frequencies?

Thanks.. Tom...

Johan_Ha · October 22, 2023, 11:01am

An arbitrary audible sound. Like me singing. In my tests I played Mozart's piano sonatas from my phone.

jremington · October 22, 2023, 12:49pm

This question comes up a lot on the forum, so people would be interested if you took the time to post the code and described the hardware in enough detail that a basic example can be reproduced.

Johan_Ha · October 22, 2023, 1:09pm

I'll do that. So far I do everything on an Infineon CY8CPROTO with integrated mics. And on Eclipse IDE. But I also have a mic module, which might be compatible with some of my Arduino boards. And since Arduino IDE is superior to all other IDEs, I might continue my testings on an Arduino.

PerryBebbington · October 22, 2023, 1:25pm

And yet, somehow, with only 2 'microphones' (ears) most humans and other animals can determine the direction of the origin of a sound in any direction.

jremington · October 22, 2023, 1:30pm

Brains are pretty powerful computers!

And ears are directional, which helps to greatly reduce the frequency of front to back mistakes.

J-M-L · October 22, 2023, 4:01pm

actually we have in the inner ear (cochlea) about four thousand very sensitive sound receptors and in your brain you have a dedicated neuronal network with a learning model that has been trained for years to deal with the 8000 inputs

quite powerful stuff

Johan_Ha · October 22, 2023, 8:35pm

But they all analyse the same single sound signal. It's not like 4000 microphones at different positions.

TomGeorge · October 22, 2023, 9:04pm

Hi,
Have you got a prototype basically working?

You mean phase shift?

A mix of frequencies in your signal, how will you choose your reference signal for a start?

Tom...

J-M-L · October 22, 2023, 10:00pm

It's true that front-back localisation is generally not as precise due to the limited cues available, and the brain relies more on the combination time differences and spectral cues to make its best estimate.

But they definitely are in a different positions along the cochlea, they also track different frequencies and the two ears are not exactly the same.

Also the shape of the outer ear gives the brain additional Spectral Cues to help localizing sounds coming from above, below , in front of, or behind the listener

There is lots of research on auditory perception and sound localization if you are interested, read about

Interaural Time Difference
Interaural Level Difference
Spectral Cues

Johan_Ha · October 23, 2023, 4:29am

Whatever sound is the loudest will dictate what determines the outcome. Say I have a sound source to the keft and another to the right. I calculate the least sum of squared differencies from all 5 sectors. Sectors 2 and 5 have small sums, all others have larger sums. I just go with the least sum.

If I'd be interested in detecting f.i. a whistle, I'd run the sampled sounds through narrow band filters. If I'd want the system to detect claps in a noisy environment, I'd need to include a threshold for sound pressure, because a clap is very short and a softer sound would dominate the least sum thing by lasting longer.

I'm not sure. To me, phase shift sounds like shifting a sine wave by pi or half wave length. What I have is sound waves, which look arbitrary, no regular wave forms. I just shift them one sample at a time, not half wave lengths.

It's about this:

That's an image of a sound coming from the left to my computer, recorded with Audacity. It hits the two microphones of my laptop. I just try to match the two waves to see how much they are shifted from each other.

Johan_Ha · October 23, 2023, 4:58am

That is irrelevant. A microphone tracks from 20 Hz to 20000 Hz. If the cochlea as a construction would be superior to a microphone, we would build microphones like that. Having different microphones specialised at different frquency bands.
To the brain, a high frequency is a high frequency and a low one is a low one. You can hear them from straight ahead, from the top, from the left or right. The ear or the cochlea doesn't distinguish directions. The two cochleas really don't tell the brain anything more than two microphones tell your recording system. But to really get microphones to record a sound as equal to the human sound sensation, binaural microphones were invented, head shaped things with ears and the mics inside the ears. To make a robot distinguish sound directions other than the 1D, 180 degree thing I have, I guess I would use such a binaural microphone. And a lot of machine learning. And a little wiggling of the head.

J-M-L · October 23, 2023, 6:34am

It’s your choice If you disagree with the research in this domain. Read about Spectral Cues for sound localisation.

Your brain use the provided information for this.

Johan_Ha · October 23, 2023, 8:24am

By irrelevant I only meant that the structure of the cochlea is meaningless, if I want to build a model that distinguishes more sound directions. Instead I need to build ears that affect the frequency spectrum depending on from which direction the sound comes. I have to analyse the sound signal from the microphone. Of course I can divide the sound into a lot of bands, through narrow band filters, and analyse each band. And that might mimic what the cochlea together with the brain does. But I believe that won't give a straight answer without a lot of machine learning first.

So I stick to a very basic functionality, which is detecting the direction in a 180° sector in front of the robot. And if the robot tries to turn towards the sound and the sound appears to first turn further away for a while, before it starts turning more towards straight ahead, it just means the sound source was slightly behind at first.

J-M-L · October 23, 2023, 9:00am

Not totally - that contributes to "perform the FFT" to go to the frequency domain and feed that information from the specialized hair cells to your brain.

With a microphone you sample and perform the FFT in two steps, here it's like you had lots of sensors tuned for a given frequency band.

Johan_Ha · October 23, 2023, 9:20am

Ok, I kind of get it. One mic and one AD can produce say a 44.1 kHz 16 bit signal. Having several narrow band mics, each could produce a 44.1 kHz 16 bit signal. Surely that would be much more information. Dividing one single full spectrum 44.1 kHz 16 bit signal into several bands (each still being 44.1 kHz 16 bit) can never lead to the same amount of information. Of course, 44.1 kHz is overkill, if we are interested in say a 100 Hz – 200 Hz band.
Several narrow band mics should instead mean that each band could somehow optimize the usage of resolution. A 100 Hz – 200 Hz band doesn't need a high sampling frequency, maybe a bigger bit depth. The 10000 Hz – 20000 Hz band again needs higher sampling frequency.

J-M-L · October 23, 2023, 9:37am

yes, the ear filters from 200Hz to 20KHz. if you want to get something similar you would need to indeed sample at least at the double of the top frequency (Nyquist theorem) so 44.1 kHz would be fine.

Estimates can range from around 4,000 to 16,000 hair cells (the exact number may vary between individuals) in the human cochlea. These hair cells are sensitive to different frequencies and are connected to the auditory nerve, which transmits electrical signals to the brain for further processing and interpretation.

Say you have 10,000 sensors so you want 10,000 bins for your FFT and so need to collect at least 10,000 samples before processing so roughly a quarter of a second. Then you need to perform the FTT on those 10,000 samples which will add a few tens of milliseconds on a fast arduino and then run the AI model

so at best you'll get ~3 Hz output for an indication of the direction of the sound.

the brain does this much faster !

Topic		Replies	Views
Is it live or is it ChatGPT? Bar Sport	14	588	April 30, 2024
Direction of Sound General Guidance	9	1498	May 5, 2021
Sound Localization using Amplitude, Mic Array Audio	29	22552	May 6, 2021
Help with creating a more refined sound localizing rover General Guidance	45	194	January 21, 2025
Projet: robot qui s'oriente vers un son Français	82	7008	May 6, 2021

Detecting direction of sound

Related topics