Capturing Drum Sounds in Music

Looking to distinguish between kick drum, snare, and cymbal in music in real time with a Nano.
Goal is to illuminate a different color LED for each “drum” element, i.e. kick = red, snare = green, cymbal = blue, while music is playing.

The individual frequencies and bands are…
Kick: 50Hz & 140Hz
Snare: 225Hz & 160-200Hz & 1.8kHz-9kHz
Cymbal: 500Hz & 400-1.25kHz & 2.5-9kHz
(Surprisingly, although they sound quite different, snare and cymbal have similar spectrums and may not be easily distinguishable from one another so may have to combine into a single detection that comprises both.)

  1. Can a mix of DFT, Goertzel, IIR filters, & FHT be performed, in the same code, each at different sample rates, to resolve the individual frequencies and frequency bands?

  2. Can sample rates be set for FHT to obtain bin resolutions of a few Hz for the individual frequencies, up to say 500Hz?

  3. Are there external DSP chips that can better handle the A/D conversions and number crunching, and the Nano can just read the DSP results and handle lighting the appropriate LEDs?

  4. Can the desired frequencies and bands be captured and resolved in less than 4msec.?

  5. Is everyone laughing yet?

Is this too ambitions a task for a Nano (and a so-so coder) or is a faster processor (and coder brain) needed?
Can I at least come close to the goal, maybe just reliably detect kick and snare?
Opinions and other ideas?

Although this kind of analysis is fairly simple for the human ear/brain, it turns out to be quite difficult for a computer.

"You can't un-fry an egg or un-bake a cake, and you can't un-mix a song."

...while music is playing.

If you're talking about multiple instruments and maybe vocals, that adds to the difficulty.

Surprisingly, although they sound quite different, snare and cymbal have similar spectrums and may not be easily distinguishable from one another so may have to combine into a single detection that comprises both.

There is a LOT of overlap with LOTS of harmonics & overtones, so it's not surprising to me at all.

  1. Can a mix of DFT, Goertzel, IIR filters, & FHT be performed, in the same code, each at different sample rates, to resolve the individual frequencies and frequency bands?

I'd say doing all of that in real-time is going to be impossible.

  1. Are there external DSP chips that can better handle the A/D conversions and number crunching, and the Nano can just read the DSP results and handle lighting the appropriate LEDs?

Yes, but I'm not up-to-speed on what's available. Of course, any regular computer has an A/D converter (soundcard), although most laptops don't have line-in (if you're working with line-level signals) and regular consumer soundcards don't interface properly with studio/stage microphones (low-impedance balanced).

  1. Can sample rates be set for FHT to obtain bin resolutions of a few Hz for the individual frequencies, up to say 500Hz?

  2. Can the desired frequencies and bands be captured and resolved in less than 4msec.?

I can't answer that either... I don't know what the exact limitations of the Arduino/Nano but obviously a regular computer will have a lot more processing power.

What you want to do is impossible. That is you can do it but it will not work the way you think it will. However this chip gives a nice display with the right code and LEDs.
MSEQ7

So much of this is impractical. Unlike Grumpy Mike, personally I prefer to use the word "impossible" rather sparingly.

However, one thing that I believe is very likely not possible is detecting the presence of a frequency component by analysis of data spanning less than 1 of its wavelengths. Your question #4 specifically asked about a 4 ms time window. 4 ms corresponds to one cycle of 250 Hz. I'm pretty sure it's impossible to detect those 50 to 140 Hz tones of the kick drum by analyzing only 4 ms of data.

I do have an interest in DSP for detecting sound patterns. I've spent the last couple years working on a much more advanced audio library with what's probably the very best sound analysis code available for any Arduino compatible board. At this moment, I'm preparing to release a new board which might have enough computational power to attempt some of these very challenging sound recognition projects.

Now that the laughter has faded..., thanks for the opinions. Admittedly was shooting for the moon, but maybe Earth orbit will suffice...

  1. Can a mix of DFT, Goertzel, IIR filters, & FHT be performed, in the same code, each at different sample rates, to resolve the individual frequencies and frequency bands?

A 128-bin FHT to 4.8kHz resolves to 37.5Hz bins. A 64-bin FHT to 19.2kHz resolves to 300Hz bins. That com"bin"ation gets me close to the individual frequencies and bands I'm looking for.

  1. Can sample rates be set for FHT to obtain bin resolutions of a few Hz for the individual frequencies, up to say 500Hz?

Anyone have an answer on this one?

  1. Are there external DSP chips that can better handle the A/D conversions and number crunching, and the Nano can just read the DSP results and handle lighting the appropriate LEDs?

Guessing there are but coding them may be difficult.

  1. Can the desired frequencies and bands be captured and resolved in less than 4msec.?

Figured this would be tough and not sure 4msec is the right number. 10msec delay between sound and light is noticeable, to me anyway, but may be acceptable to most.

MSGEQ7... tried it, but the bands are so narrow that there are portions of songs (including harmonics), like some solo piano parts, that fall in between the bands and are completely missed.

MSGEQ7... tried it, but the bands are so narrow

What? They are only second order filters and there is a great deal of overlap. This video of mine shows a signal generator doing a manual sweep and you can see there is a great deal of overlap rather than holes.

However if you want the channels in between then simply run another chip with a different frequency oscillator. It is the oscillator that governs the actual frequency detected because it is a switched capacitor filter array.

  1. Can sample rates be set for FHT to obtain bin resolutions of a few Hz for the individual frequencies, up to say 500Hz?

Yes. The top bin is half the sampling rate and the more bins the more resolution at the lower end.

How about using a 1024 point FFT?

They are only second order filters and there is a great deal of overlap.

MSGEQ7 Filter Q is 6, between 1/3 and 1/6 octave, so for the 1kHz band, -3dB points are 920 and 1086Hz… relatively narrow. Looking at the video, to my eye, the level is quite low in the band overlap region compared to the band peaks.

The top bin is half the sampling rate and the more bins the more resolution at the lower end.

So a top bin of 500Hz and 128 bins is 4Hz per bin. How do you get the 500Hz*2 sampling rate if the lowest rate is 4.8kHz with ADPS2,1,0 all "1"s… is there another way to set it?

Something else I don’t understand…
When trying to measure the FHT conversion time, bracketing it with “StartTime = micros();” and “EndTime = micros(); “, and printing the difference, get unexpected values. " ET (us) = 1328” seems too fast since OpenMusicLabs characterizes a N = 256 run, reorder, and lin at about 4109us. What are all the " 4294967144” values from and how can I accurately measure the conversion time?

Start:  21760 436 386 584 306 254 330 306 342 384 386 356 376 410 392 512 6528 264 258 264 284 264 306 286 278 306 306 308 328 300 308 290 346    ET (us) = 1328
Start:  22144  82 203 324  28  70  64  56  39  36  30  60  84 143 148 296 7040 146  87  73  31  97  53  63  65  35  36  63  53  38  12  28  60    ET (us) = 4294967148
Start:  22272  97 125 288  46  42  49  45  56  32  82  26  58  30 142 212 7040 199  96  51  74  44  23  26  36  31  59  31  12  28  16  17  12    ET (us) = 4294967144
Start:  22272 175 194 247  79 105  81  49  15  41  65  82  19  95 139 247 7040 208  99  88  42  28  76  54  31  29   4   4  33   9  16  26  20    ET (us) = 4294967144
Start:  22144  88 264 316  59  76  60  43  21  47  63  63  45  58  98 189 6784 190 118  36  80  47  74  15  48  60  45  22  25  31  48  16  34    ET (us) = 4294967156

My code is here…

/*
fht_adc_serial.pde
guest openmusiclabs.com 7.7.14
example sketch for testing the fht library.
it takes in data on ADC0 (Analog0) and processes them
with the fht. the data is sent out over the serial
port at xxx.xkb.
*/

/* version 1.1


*/

#define LIN_OUT 1 // use the log output function
#define FHT_N 256 // set to N point fht, 16, 32, 64, 128, 256 (gives N/2 freq bins)
#define cbi(sfr, bit) (_SFR_BYTE(sfr) &= ~_BV(bit))
#define sbi(sfr, bit) (_SFR_BYTE(sfr) |= _BV(bit))

// Libraries in "C:\Users\rick\Documents\Arduino\libraries"
#include <FHT.h> // include the library

unsigned long StartTime = 0;
unsigned long EndTime = 0;

void setup() {

  
/*
Prescale  ADPS2,1,0  Clock(MHz)  S.Rate(kHz)  BW(kHz)
  2 	      0 0 1 	8 	  615          307        
  4 	      0 1 0 	4 	  307          153
  8 	      0 1 1 	2 	  153          76.8
  16 	      1 0 0 	1 	  76.8         38.4
  32 	      1 0 1 	0.5 	  38.4         19.2       ADCSRA = 0xX5;
  64 	      1 1 0 	0.25 	  19.2          9.6       ADCSRA = 0xX6;
  128 	      1 1 1 	0.125 	   9.6          4.8       ADCSRA = 0xX7;
  
		#Bins	BW(Hz)	Res.(Hz/bin)
FHT_N 	128	64	19,200	300.0      ADCSRA = 0xX5;
FHT_N 	64	32	 4,800	150.0      ADCSRA = 0xX7;  Global variables use 430 bytes (20%) of dynamic memory
FHT_N 	128	64	 9,600	150.0      ADCSRA = 0xX6;
FHT_N 	128	64	 4,800	 75.0      ADCSRA = 0xX7;  Global variables use 622 bytes (30%) of dynamic memory
FHT_N 	256	128	 4,800	 37.5      ADCSRA = 0xX7;  Global variables use 1,006 bytes (49%) of dynamic memory,

*/
  
  Serial.begin(38400); // use the serial port
  TIMSK0 = 0; // turn off timer0 for lower jitter
  ADCSRA = 0xe7; // set the adc to free running mode
  ADMUX = 0x40; // use adc0
  DIDR0 = 0x01; // turn off the digital input for adc0

} // end Setup

void loop() {
  while(1) { // reduces jitter
StartTime = micros();
    cli();  // UDRE interrupt slows this way down on arduino1.0
    for (int i = 0 ; i < FHT_N ; i++) { // save 256 samples
      while(!(ADCSRA & 0x10)); // wait for adc to be ready
      ADCSRA = 0xf7; // restart adc
      byte m = ADCL; // fetch adc data
      byte j = ADCH;
      int k = (j << 8) | m; // form into an int
      k -= 0x0200; // form into a signed int
      k <<= 6; // form into a 16b signed int
      fht_input[i] = k; // put real data into bins
    }
//    fht_window(); // window the data for better frequency response
    fht_reorder(); // reorder the data before doing the fht
    fht_run(); // process the data in the fht
    fht_mag_lin(); // take the output of the fht
    sei();
EndTime = micros(); 
    
    Serial.print("Start:  ");
//    for (byte i = 0 ; i < FHT_N/2 ; i++) {
    for (byte i = 0 ; i < 33 ; i++) {      
      static char stmp[16];
      sprintf(stmp,"%3d ",fht_lin_out[i]); // send out the data
      Serial.print(stmp);
    }
     Serial.print("   ET (us) = "); 
     Serial.print(EndTime - StartTime);

     Serial.println();
  }
}

to my eye, the level is quite low in the band overlap region compared to the band peaks.

OK so as I said then use two or more chips to fill in the holes. The clock oscillator on pin 8 determines the absolute frequency of the set of peak responses. Put a pot in place of the 200K resistor ( or rather a pot and fixed resistor to give you a finer control ) and adjust the pot to fill in the gaps between the peaks.

How do you get the 500Hz*2 sampling rate if the lowest rate is 4.8kHz with ADPS2,1,0 all "1"s.

What processor are we talking about here? However you can change the code to only store one sample every N samples to get your sample rate down.

What are all the " 4294967144" values from

It could be when the micros timer wraps round.

how can I accurately measure the conversion time?

Put a pin high at the start and low at the end using direct port mapping and measure the time interval on an oscilloscope.

...use two or more chips to fill in the holes

Yes, it would double the resolution, but I'm not a big fan of these chips as their specs are questionable... it appears not to work to its timing specs but does work outside its specs.

I could be when the micros timer wraps round.

I thought that too, but from another post, "According to the documentation, approximately every 70 minutes, the micros timer rolls over...". Here, it's "rolling" in less than a second. Measuring with a scope is better though.

Don't understand the jumping around of the measured values, by several hundred counts.

it appears not to work to its timing specs but does work outside its specs.

Not sure what that means.

I have found that putting an oscilloscope probe on the internal oscillator is enough to pull the frequency from the probe capacatance.

Was this comment about the Nano's crystal oscillator, or about the width and accuracy of the MSGEQ7 filter passbands?

it appears not to work to its timing specs but does work outside its specs.

When I set "Reset to Strobe Delay" and "Reset to Strobe Delay" to spec, as viewed with scope, get very noisy, erratic output vs. not setting these delays. Only the "Output Settling Time" time seems to matter. When I questioned the manufacturer, was told "use whatever works" which does not instill a lot of confidence in the device.

Pulling the clock 20% low does expectedly put the first filter band at 50Hz as desired and the other bands also at 20% lower than spec. frequencies.

When I set "Reset to Strobe Delay" and "Reset to Strobe Delay" to spec, as viewed with scope, get very noisy, erratic output vs. not setting these delays.

Are these delays you used then less than the minimum? This is not what I found, I used 1mS for the delays and didn't have any problem.

Are these delays you used then less than the minimum?

No. The delays used are somewhat over the minimums. I’ll try the 1ms waits.

Going back to Post #7

ET (us) = 1328
ET (us) = 4294967148
ET (us) = 4294967144
ET (us) = 4294967144

The following link explains the issue. “TIMSK0 = 0;” messes up micros(), hence the unexpected value, and commenting this line restores the timer and micros() now reads correctly.
http://forums.openmusiclabs.com/viewtopic.php?f=7&t=312

Back to FHT, Getting noisy values (but now a believable ET (us) value)…

Start:  22144 251  44  67  50  69   9  66  40  59  43  61  44  42  50  33  28  27  33  74  44  60  28  15  38  12  26  35  30  32  26  23  34    ET (us) = 10836
Start:  22400  95  48 163 115  71  81  99  25  48  17  50  29  55  58  23  61   5   4  42  54  46  42  47  20  36  68  22  27  23  31  27  14    ET (us) = 10832
Start:  22016 122 260  29  47  38  71  86  38  68  25  36  44  26  12  68   3  25  29  39  31  30  55  78  16  35  16  38   3  32  33  31  23    ET (us) = 10836
Start:  22016 251  61  27  79  88  99 150  59  77  30  44  20  57  51  41  45  12   7  55  86  40  32  31  38  44  42  32  37  63  23  45  14    ET (us) = 10832
Start:  22016 154  78 195  45 142  64 112  34  32  26  14  52  21  69  39  20  31  34  34  29  40  52  20  25  29   9  54  25  35  43  20  22    ET (us) = 10832
Start:  22272 544 122  41  61  32  94  81  36  48  21   8  34  53  21  52  41  30  34  45   4  20  29  32  22  14  29  56  16  24  32  26  32    ET (us) = 10832
Start:  22016 326 249 205   2  61  61  27  53  84  36  45  65  18  65  12  13  47  11  10  21  34  42  44  20  22  35  32  10   9  26  25  26    ET (us) = 10836

Using this code…

/*
fht_adc_serial.pde
guest openmusiclabs.com 7.7.14
example sketch for testing the fht library.
it takes in data on ADC0 (Analog0) and processes them
with the fht. the data is sent out over the serial
port at xxx.xkb.
*/

/* version 1.1


*/

#define LIN_OUT 1 // use the log output function
#define FHT_N 256 // set to N point fht, 16, 32, 64, 128, 256 (gives N/2 freq bins)
#define cbi(sfr, bit) (_SFR_BYTE(sfr) &= ~_BV(bit))
#define sbi(sfr, bit) (_SFR_BYTE(sfr) |= _BV(bit))

// Libraries in "C:\Users\rick\Documents\Arduino\libraries"
#include <FHT.h> // include the library

unsigned long StartTime = 0;
unsigned long EndTime = 0;

void setup() {

  int SoftwareMajRev = 1;   // Major Revision
  int SoftwareMinRev = 1;   // Minor Revision

 // Blinks for Major Revision
 for (int i = 1; i <= SoftwareMajRev; i+=1){
  digitalWrite(13, HIGH);  // Turn ON the "Beat Detected" LED
  delay(150);
  digitalWrite(13, LOW);   // Turn OFF the "Beat Detected" LED
  delay(150);   
 } 
 
 delay(500);              // Wait between Major and Minor Revision blinks
 
 // Blinks for Minor Revision
 for (int i = 1; i <= SoftwareMinRev; i+=1){
  digitalWrite(13, HIGH);  // Turn ON the "Beat Detected" LED
  delay(150);
  digitalWrite(13, LOW);   // Turn OFF the "Beat Detected" LED
  delay(150);   
 }
  
/*
Prescale  ADPS2,1,0  Clock(MHz)  S.Rate(kHz)  BW(kHz)
  2 	      0 0 1 	8 	  615          307        
  4 	      0 1 0 	4 	  307          153
  8 	      0 1 1 	2 	  153          76.8
  16 	      1 0 0 	1 	  76.8         38.4
  32 	      1 0 1 	0.5 	  38.4         19.2       ADCSRA = 0xX5;
  64 	      1 1 0 	0.25 	  19.2          9.6       ADCSRA = 0xX6;
  128 	      1 1 1 	0.125 	   9.6          4.8       ADCSRA = 0xX7;
  
		#Bins	BW(Hz)	Res.(Hz/bin)
FHT_N 	128	64	19,200	300.0      ADCSRA = 0xX5;
FHT_N 	64	32	 4,800	150.0      ADCSRA = 0xX7;  Global variables use 430 bytes (20%) of dynamic memory
FHT_N 	128	64	 9,600	150.0      ADCSRA = 0xX6;
FHT_N 	128	64	 4,800	 75.0      ADCSRA = 0xX7;  Global variables use 622 bytes (30%) of dynamic memory
FHT_N 	256	128	 4,800	 37.5      ADCSRA = 0xX7;  Global variables use 1,006 bytes (49%) of dynamic memory,

*/
  
  Serial.begin(38400); // use the serial port
//  TIMSK0 = 0; // turn off timer0 for lower jitter, but delay() and micros() WILL NOT WORK.
  ADCSRA = 0xe5; // set the adc to free running mode, and set Prescaler
  ADMUX = 0x40; // use adc0
  DIDR0 = 0x01; // turn off the digital input for adc0

} // end Setup

void loop() {
  while(1) { // reduces jitter
StartTime = micros();
//    cli();  // UDRE interrupt slows this way down on arduino1.0. This also seems to disrupt the timer.
    for (int i = 0 ; i < FHT_N ; i++) { // save 256 samples
      while(!(ADCSRA & 0x10)); // wait for adc to be ready
      ADCSRA = 0xf5; // restart adc, and set Prescaler
      byte m = ADCL; // fetch adc data
      byte j = ADCH;
      int k = (j << 8) | m; // form into an int
      k -= 0x0200; // form into a signed int
      k <<= 6; // form into a 16b signed int
      fht_input[i] = k; // put real data into bins
    }
//    fht_window(); // window the data for better frequency response
    fht_reorder(); // reorder the data before doing the fht
    fht_run(); // process the data in the fht
    fht_mag_lin(); // take the output of the fht
//    sei(); //  This also seems to disrupt the timer.

//StartTime = micros();
//delayMicroseconds(2000);
EndTime = micros(); 
    
    Serial.print("Start:  ");
//    for (byte i = 0 ; i < FHT_N/2 ; i++) {
    for (byte i = 0 ; i < 33 ; i++) {  // Just print the first 32 bins.    
      static char stmp[16];
      sprintf(stmp,"%3d ",fht_lin_out[i]); // send out the data
      Serial.print(stmp);
    }
     Serial.print("   ET (us) = "); 
     Serial.print(EndTime - StartTime);

     Serial.println();
  }
}

Which performs a 256 point FHT at a 38.4kHz sample rate.
Thought the noisy readings might be due to having no antialiasing filter but the mic preamp is spec’d at 20 - 20kHz, so should already in a sense be limiting the audio input to 1/2 the sample rate or 19.2kHz.

Read that this code scales the input to 512 to account for + & - readings (don’t know where in the code though), but the mic preamp output is already at V+/2 or 2.5V. Could that be causing the noisy readings?

Any idea why the noisy readings?