# Sound Localization using Amplitude, Mic Array

Hey guys, so I'm trying to figure out a way, given an array of 4-6 omnidirectional microphones in ideal conditions, to use the amplitude of a signal to localize the location of that sound (using an arduino, but offloading any heavy processing to the computer via serial).

I'm using a timer interrupt, with polling, to get a max sample rate of maybe 1.5kHz (per mic), and I'm already looked at TDOA but it seems way too complicated for a rough approximation. The data is all good, and I'm working with the raw byte data of the samples. I just need help with the analysis

Assuming for a second TDOA is out of the picture, could I use the difference in amplitude from a single sound source (say a 220-Hz sine wave outputted out of a speaker) to find an (approximate) direction of the sound source. Let's say that accuracy can be in very approximate, and that we just want to find the angle of the speaker +/- 5 degrees from the center of the mic array.

The microphone array is currently in a line, each mic ~64mm from the next one, in the same plane.

Also, why has no-one ever tried using amplitude analysis versus using tdoa/fdoa for approximate sound localisation? It sounds like an obvious way to do things...

I've looked at the forum articles, but they're all either too approximate or use tdoa, which I'm trying to avoid.

Thanks, and let me know if I forgot anything.

I would think that microprocessor chips with lots of built-in hardware DSP support would be more pratical for that kind of application?

http://www.microchip.com/stellent/idcplg?IdcService=SS_GET_PAGE&nodeId=2680&dDocName=en023598

Lefty

I don't think amplitude will work.

We definitely can't tell with our ears based on amplitude. If it worked well, there would probably be some animals that could do it based on amplitude. I think amplitude is too variable and not dependent on location. It can be louder in one ear (mic) than the other but it could have taken longer to get there.

If you hook up some headphones to a mixer, and pan it left to right, you'll see that there is no real gradation (it's either in both or one or the other). For the distance between our ears, if I remember correctly, it's about a +-25msec delay to pan from one ear to the other.

I'm guessing that if your mics are all in a plane, you can tell where the speaker is in the plane from the delay between the mics. And, if you can't figure it out algorithmically, you could just move it one degree at a time and measure the delays. Whichever measurement it's closest to, that would be where the speaker is.

You should be able to do it with 2 mics, and 3 would be nice.

Also, why has no-one ever tried using amplitude analysis versus using tdoa/fdoa for approximate sound localisation?

Because of this:

The microphone array is currently in a line, each mic ~64mm from the next one

perhaps? The drop off in amplitude between adjacent microphones would be tiny.

Hi jigajigajoo,

Playing with this idea for some time now. I chose the time of arrival approach because for speeding up the ad converter you will loose accuracy. Another problem is that with samples coming in at a fairly high speed there isn’t much time left to analyze the signal. I use the ad converter way above speed specification (ad prescaler of /8, only taking 8 bit samples instead of 10bits) and therefore don’t want to rely too much on an accurate representation of waveform.

I chose a hand-clap as sound source because i think/hope i can trigger on that ‘easily/accurately’. At the moment a mic. is treated ‘triggered’ if a new ad value is bigger than a running average + fixed value. Got the ad converter in true free running mode (new conversion starts immediately after a conversion is finished, all in hardware. Adc sample freq. becomes f_cpu / ad prescaler / 13.5. Check datasheet) and i’m using that as timebase as i had to kill the isr that is used for maintaining micros(), millis() and delay().

First results are that with the 4 mic’s in a square with sides ~50cm i can fairly reliably say in what segment/direction the sound source was. Although 5 degrees accuracy isn’t there yet. Even the ‘difficult’ positions (on top of a mic or right in the middle of the square) i could fairly reliably point out. Very cool to only look at the serial monitor, dad clapping his hands and me saying from which direction it was.

But the first thing i did was to make a simulator so i could stare at the process to see/understand/get a feeling for what is actually happening. See attached picture.

Also, why has no-one ever tried using amplitude analysis versus using tdoa/fdoa for approximate sound localisation? It sounds like an obvious way to do things…

Could you explain why you think that’s the obvious way to do things? Just curious, this is my first time ever that i’m doing this kind of signal processing and (maybe naively) thought time of arrival was a nice simple to understand approach.

Jeroen.

IMHO, there are two different way to get solution:

1). TDOA. In this case we don't need to know amplitude, so ADC with it's timing limits could be put aside. Signal from mic goes to high gain amplifier , clamped 0 or +5V and connected to digital input. Checking state of the pin periodically, or better assigning interrupt on changing state digital pin, we will "timestamp" arriving event of the clap-sound. Using iterative algorithm, solution could be find for matrix of 4 mics, with specified precision.

2). Amplitude. To get better space/strength resolution, mics have to be "directional", instead of omni-directional. Similarly, like they do it for radio - triangulation with rotating antennas. The issue would be with short duration sounds.

BTW, animals use both methods, difference in amplitude works as their ears are "directional" devices.

1). TDOA. In this case we don't need to know amplitude, so ADC with it's timing limits could be put aside. Signal from mic goes to high gain amplifier , clamped 0 or +5V and connected to digital input.

Correct. We would have to deal with slowly changing background noise but that's detail. As all of this is one big learning experience in software/hardware/physics for me, i am more afraid of the hardware part in that than the relative slow speed of the ad converter. In theory the downside should only translate to accuracy of result. That is the reason why i chose the ad route. I got the other hardware prebuild.

Jeroen.

Thanks so much for all the great responses! This is why I love the arduino community...

I would think that microprocessor chips with lots of built-in hardware DSP support would be more pratical for that kind of application?

The point of my project is to see whether I can do basic DSP, etc without needing specialized hardware for analysis.

I don't think amplitude will work.

You're 90% correct, at distances beyond maybe .5m background noise and random noise kill any directionality I get. Within .5m, though, it actually works surprisingly well (as long as the sound source is max +/- 60 degrees of the center of the board in either direction).

1). TDOA. In this case we don't need to know amplitude, so ADC with it's timing limits could be put aside. Signal from mic goes to high gain amplifier , clamped 0 or +5V and connected to digital input.

That's a gorgeous idea! I never thought of that... :(. Unfortunately my EE experience is very basic, but I think I could hypothetically do that. The trick is being able to decide the level where the mic input becomes significant (+5V). I guess I could use a pot or something to decide that...

YOT: Playing with this idea for some time now. I chose the time of arrival approach because for speeding up the ad converter you will loose accuracy.

Yot, I chose the amplitude way because in my head I thought the only way to figure out the time delay was with FFT and cross-correlation, which is almost impossible to do with such a crappy sample rate on the Arduino (not to mention the processing speed!).

Additionally, I'd love to see your code for the timer interrupts/ADC free-running. I'm still new to this whole microcontroller business, and I was using a pre-made library for the timer interrupts on my old code.

As of right now, I'm trying to write the individual values of each mic (via analogRead()) as fast as possible to serial (as bytes), and then doing any analysis on them using Processing (with big beefy speeds :D).

I had an idea about using the difference between the average root mean square amplitudes of each mic to find approximate angles, but past .5m or so I get basically random data.

FFT and cross-correlation, which is almost impossible to do with such a crappy sample rate on the Arduino

Sample rate has got absolutely nothing to do with a processor’s ability to do the other operations.

Some code snippets. Hobbyist programmer, watch out. Taken from the code i am busy with. By no means final/correct and possibly plain bad in some cases. I assume you read/understand what the datasheet says on this topic.

``````// used in setup()
bitClear(TIMSK0, TOIE0);  // no more timer0 overflow interrupt! No more: micros(), millis(), delay()

void analogInit8Bit(byte preS)  // lowest prescaler value i use is 3 (/8). Resulting ad clock (not samplerate) of 2MHz is way out of spec. btw.
{
DIDR0 = B00111111;  // disable all digital input buffers

// AVCC with external capacitor at AREF pin(arduino default)

// enable a2d conversions
// set prescaler
}

{
// set the analog reference (high two bits)
// select the channel (low 4 bits).
ADMUX = (DEFAULT << 6) | (1 << ADLAR) | (pin & 0x07);

// start the conversion

// ADSC is cleared when the conversion finishes

}

{
// resetting some variables and arrays here.
// in this case
sensorNumber = 0;

ADMUX = (DEFAULT << 6) | (1 << ADLAR) | (sensorNumber & 0x07);
ADCSRB = B00000000;                                         // Auto Trigger Source = Free Running mode. ADTS = 000

}

{
PORTB = B00100000;      // led pin 13 on.  note: be careful here as i assume nothing is connected to the other pins represented in PORTB register.

sensorNumber++;
if (sensorNumber == NUMBEROFSENSORS )
{
sensorNumber = 0;                  // roll over.
}

// AVCC with external capacitor at AREF pin(arduino default)
// input pin
ADMUX = (DEFAULT << 6) | (1 << ADLAR) | (sensorNumber & 0x07);    // Here i change to another analog input. Note that there is already a conversion taking place. Change of input will take affect _after_ the current conversion is complete. Be sure to check which return value belongs to which sensor.

PORTB = B00000000;      // led pin 13 off. note: be careful here as i assume nothing is connected to the other pins represented in PORTB register.
}

{
}
``````

I removed the signal processing part to clarify the adc hardware settings.

Jeroen.

Sample rate has got absolutely nothing to do with a processor’s ability to do the other operations.

Except I can’t get a high enough sample rate to get a complete picture of a fast signal (ie sin wave at 2kHz), and then the cross-correlation would give me wrong (AFAIK). Granted, the problem here is more processing speed than sample rate, you’re right (:D).

Some code snippets. Hobbyist programmer, watch out.

Yot, I sort of understand your code (after 30 min with the manual :D), so let me make sure I understand what's going on. Basically, you set up the AD for all the sexy prescaling action, and then you can attach an interrupt which calls the SIGNAL function every time an 8-bit analog value is read in.

Then, in the SIGNAL function, you cycle through all the sensors, and do whatever you want with the value.

Do you ever use the analogRead8Bit function? Or can you just use the interrupt...

How much time do you have to do stuff in the interrupt function? At these speeds you don't have a lot of clock cycles to go crazy. Or do you read it to an array or something like that.

So what sorts of sample rates are you getting with this? Ie when the AD is divided amongst the four sensors in your case...

jigajigajoo:

Sample rate has got absolutely nothing to do with a processor's ability to do the other operations.

Except I can't get a high enough sample rate to get a complete picture of a fast signal (ie sin wave at 2kHz), and then the cross-correlation would give me wrong (AFAIK). Granted, the problem here is more processing speed than sample rate, you're right (:D).

There is the elmchan fft lib that can do 64 points fft in less tha 1.5ms, and the adc of the arduino running in 10 bits mode can do about 16Khz of sample rate, pass to 8 bits and bump the speed to the double, or use an external spi one.

Basically, you set up the AD for all the sexy prescaling action, and then you can attach an interrupt which calls the SIGNAL function every time an 8-bit analog value is read in.

I set up the adc hardware, including the prescaler in analogInit8Bit(byte preS) attachAnalogInterrupt() sets up the retriggering after taking a sample, sets the input pin and starts a conversion. The signal function is the interrupt service routine. It's called after a conversion is complete and, because of auto triggering, the next conversion is already started. Here you get the actual 8bit value.

Then, in the SIGNAL function, you cycle through all the sensors, and do whatever you want with the value.

Correct. I just want to note that at the start the actual order of inputs you get will not be what you think it will be. I can't really explain without using a ton of words. The trick is that when the input pin is changed the next conversion is already in process. That input change will take effect at the beginning of the next conversion , after that beginning you will enter the isr.

Do you ever use the analogRead8Bit function?

i use it to set up some variables like the running average. After that it's interrupt only.

How much time do you have to do stuff in the interrupt function?

haven't calculated the actual time yet but i eye-ball the brightness of the led on pin 13 and if i want to know the actual load i attach the logic analyzer.

Or do you read it to an array or something like that.

nope. I calculate a running average with the new reading, compare that to the new reading. Big change = hand clap. (start simple principle here)

So what sorts of sample rates are you getting with this? Ie when the AD is divided amongst the four sensors in your case...

16MHz / prescaler / 13.5 / 4 sensors. Prescaler of 3 (/8), about 37KHz per input. Remember, way out of spec. I haven't even started to look at accuracy. I only know that a prescaler of 2 ( / 4) gives readings that are not usable anymore. (0xFF, 255)

regards, Jeroen.

Magician: BTW, animals use both methods, difference in amplitude works as their ears are "directional" devices.

@magician Yes, I think there's additional information there but not nearly as much as the delay. Also, I think you need to rotate your ears/mics to get most of the information from amplitude.

@jigajigajoo If you have a speaker closer than 1.5m but you direct the speaker towards the mic that is farther away, the mic that is farther away will still pick up the largest amplitude. The delay will still be the same whether directed or not.

There is the elmchan fft lib that can do 64 points fft in less tha 1.5ms, and the adc of the arduino running in 10 bits mode can do about 16Khz of sample rate, pass to 8 bits and bump the speed to the double, or use an external spi one.

Senso,
I saw the fft implementation, and played with it a little a week or two ago. Although it may be possible, I honestly have no idea how to implement a cross-correlation algolrithm on an arduino (or at all for that matter), and it seems a little too computationally expensive when all I/we are looking for is a rough approximation.

Yot, I hate to be that guy, but do you think you could post your complete sketch, with the interrupts and the dsp and all that? I’m having a lot of trouble implementing the timer interrupts to my satisfaction (my not-working code below in full).

`````` #include <SoftwareSerial.h>

// various consts
const int NUM_SENSORS = 3;                // Does not include 0
const byte preS = 3;
byte midConst = 84;
byte maxConst = 168;
byte threshold = maxConst;

unsigned long cycles = 0;

int sensorNumber = 0;    // -1 because it increments at the beginning
byte newValue;

byte smoothedValues[NUM_SENSORS];
unsigned long timestamps[NUM_SENSORS];
volatile boolean sensorsOff[NUM_SENSORS];        // Sensors start on

void setup()
{
Serial.begin(115200);
// used in setup()
bitClear(TIMSK0, TOIE0);  // no more timer0 overflow interrupt! No more: micros(), millis(), delay()
}

void analogInit8Bit(byte preS)  // lowest prescaler value i use is 3 (/8). Resulting ad clock (not samplerate) of 2MHz is way out of spec. btw.
{
DIDR0 = B00111111;  // disable all digital input buffers

// AVCC with external capacitor at AREF pin(arduino default)

// enable a2d conversions
// set prescaler
}

{
// set the analog reference (high two bits)
// select the channel (low 4 bits).
ADMUX = (DEFAULT << 6) | (1 << ADLAR) | (pin & 0x07);

// start the conversion

// ADSC is cleared when the conversion finishes

}

{
// resetting some variables and arrays here.
// in this case
sensorNumber = 0;

ADMUX = (DEFAULT << 6) | (1 << ADLAR) | (sensorNumber & 0x07);
ADCSRB = B00000000;                                         // Auto Trigger Source = Free Running mode. ADTS = 000

}

{
PORTB = B00100000;   // led pin 13 on.  note: be careful here as i assume nothing is connected to the other pins represented in PORTB registe

// increment cycles, for timesheet (but only if first timestamp captured)
if (cycles != 0)
{
cycles++;
}

// rectify value, to <340 back to above 340 (but in 0-255 range)
if (newValue < midConst)
{
newValue = midConst + midConst - newValue;
}

// smoothed val = old smooth - (old smooth) / 8 + newval / 8
smoothedValues[sensorNumber] = smoothedValues[sensorNumber] - (smoothedValues[sensorNumber] / 8) + (newValue / 8);

// print out debug info
if(!sensorsOff[sensorNumber])
{
Serial.print("for sensor ");
Serial.print(sensorNumber);
Serial.print(" val: ");
Serial.println(smoothedValues[sensorNumber], DEC);
}

// if sensor is still active, and above threshold, capture/timestamp it, and turn it off
if(smoothedValues[sensorNumber] >= threshold && !sensorsOff[sensorNumber])
{
timestamps[sensorNumber] = cycles;
sensorsOff[sensorNumber] = true;
Serial.print("sensor ");
Serial.print(sensorNumber, DEC);
Serial.print(" turned off, at cycle ");
Serial.println(cycles, DEC);
if(!cycles) {
cycles++;
}
}

if (sensorNumber == NUM_SENSORS )
{
sensorNumber = 0;                  // roll over.
}
sensorNumber++;

// AVCC with external capacitor at AREF pin(arduino default)
// input pin
ADMUX = (DEFAULT << 6) | (1 << ADLAR) | (sensorNumber & 0x07);    // Here i change to another analog input. Note that there is already a conversion taking place. Change of input will take affect
//after_ the current conversion is complete. Be sure to check which return value belongs to which sensor.

PORTB = B00000000;      // led pin 13 off. note: be careful here as i assume nothing is connected to the other pins represented in PORTB register.
}

{
}

void loop() {
// when all timestamps in, analyze! (or in this case send em over to processing)
if(sensorsOff[0] && sensorsOff[1] && sensorsOff[2])
{
sensorsOff[0] = false;
Serial.println("endinterrupt---");
return;
for(int i=0; i < NUM_SENSORS; i++)
{
sensorsOff[i] = false;
}
//serialWrite();
cycles = 0;
sensorNumber = 0;
}
}

void serialWrite()
{
Serial.write(255);
Serial.println(timestamps[1]);
return;
for(int i=0; i < NUM_SENSORS; i++)
{
Serial.write(timestamps[i]);
}
}
``````

oscarcar: @jigajigajoo If you have a speaker closer than 1.5m but you direct the speaker towards the mic that is farther away, the mic that is farther away will still pick up the largest amplitude. The delay will still be the same whether directed or not.

I'm assuming for a second that this is an ideal situation, where the speaker is pointed directly down the angle I'm trying to find for it. I'll cross that bridge when I come to it :D.

@oscarcar

Also, I think you need to rotate your ears/mics to get most of the information from amplitude.

Well, it's true. Rotation of the directional mic is serious drawback in this design. But even with TDOA you still have to rotate a "head", so two ears wouldn't be situated on a line, which is perpendicular to sound source. :~ With only 64 mm between mics and 1.5 m to source, angle is really small 0.059 degree.

@OP what do you mean cross-correlation algorithm, run sample array against what?

With only 64 mm between mics and 1.5 m to source, angle is really small 0.059 degree