Sound recognition

Hello I what to make a project that would end up recognizing a certain sound, say a bird call for example. Is that even doable with an arduino? I mean I do understand the complication of even programmatically telling one sound apert from another, the question is can and arduino do it or would I have to get a raspberry pie for that? Is there anythig out there I could use as a base to start with? I mean could something like the easy VR Shield 2.0 be a good place to start from or is that only going to work with spoken language?

I thank you in advance for taking the time to answer.

I suspect that a VR shield could do it. But the VR systems have problems with accents, so you may find that it can recognise a particular bird but not the same call of different birds of the same species. But yes I think you could get it to pick out the call of a Herring gull and sort it out from say a robin.

Mark

make a project that would end up recognizing a certain sound, say a bird call for example

Are you actually trying to recognize a bird call?

If so, I'm not sure the VR Shield is going to help, even though it has "32 user-defined Speaker Dependent (SD) triggers or commands (any language)", as its designed to recognize the human voice which is considerably different from bird call.

I spoke to some local bird watchers, and there were issues because the same bird doesn't make the same call all the time, and the major issue with automatically recognizing the sound is that birds often chirp when there are other noises e.g. other birds.

Anyway, if you're not trying to recognize bird calls, ignore this :wink:

Out of the box the EasyVR 2.0 shield does not work well with moderate background noises present, as any noise loud enough will trigger the shield. Once triggered it takes several seconds to pickup/compare/respond and go back into listening mode. With enough background noise this just becomes an infinite loop.
Definitively would need a pro quality mic to isolate the area your monitoring.
With that said it can be used to respond to different types of whistles.
Of course the biggest challenge is having the bird patiently sit and sing to the shield while you program it to recognize that birds song.
It would most likely work best if it was indoors like for a parakeet/parrot. As it's the same bird always singing the pitch won't vary as much as it would of different birds of the same species in the wild.

I'd go with the raspberry pi, as the 328 arduino having 2k of sram is about enough to program to recognise a few well-defined sounds such as "rin-ring" or "engaged" on a telephone or "nee naww nee naww" of a police car of one particular country but would leave you struggling to convert most other ambient sounds to well defined computer states. Try looking up a biology book or encyclopedia on some of the uk birdsongs. Some of them are quite complex patterns.

Are you aware of any "high end" expensive systems capable of doing what you want to do with low cost hobbist equipment ?
can you post a single link to a video demonstrating an expensive "no money spared" system doing what you are talking about doing ?

holmes4:
I suspect that a VR shield could do it.

And I would suspect that it can't.

There are various apps to recognise music, but I think it's hard to do. Something as variable as birdsong would be a lot harder. Google says that Isoperla created an iOS app for it, but I don't know how well it works. I can't imagine how you'd achieve this on a microcontroller - it seems unrealistic to me.

Let me make this clear. I am not saying that such a thing can not be made with today's technology. I believe it can. I also believe you can't afford it or it requires a powerful computer laptop or desktop.

I do not have access to the full paper, but it would appear to be possible on an 8-bit uC:

Single-chip speech recognition system based on 8051 microcontroller core

Shi Yuanyuan, Liu Jia, Liu Runsheng
Dept. of Electron. Eng., Tsinghua Univ., Beijing
IEEE Transactions on Consumer Electronics (Impact Factor: 1.09). 03/2001; DOI:10.1109/30.920433
Source: IEEE Xplore
ABSTRACT This paper describes a single-chip speech recognition system. It
contains the speech functions of prompt, playback, speaker-dependent
speech recognition, suitable for the voice activated systems in toys,
games, consumer electronics, office devices, etc. The chip is designed
based on the SOC (system on chip) philosophy and an 8-bit MCU, RAM, ROM,
ADC/DAC, PWM, I/O ports and other peripheral circuits are all embedded
in it. Software modules including control/communication, speech coding
and speech recognition algorithms are implemented in an 8051 compatible
microcontroller core, resulting in the extremely low cost of the chip.
The speech recognition adopts the template matching technique. It
recognizes up to 20 phrases with an average length of 1 second and the
recognition accuracy reaches more than 95% with the background SNR above
10 dB. Speech coding uses continuous variable slope delta modulation
(CVSD) algorithm. The bit rate is 16 kbits/s

I don't think the original poster has replied to this thread yet, have they???

Someone asked me about the possibility of doing this as a phone app about a year ago, but after extensive research I decided that I general purpose bird call identifier even on a smartphone with nice fast CPU and lots of ram and file storage, was not going to be an easy thing to produce.

I notice there is now an iPhone app that does this and is getting reasonable reviews, but the user has to edit the sound file so that the piece of audio that is to be analysed only contains the sound of a single bird with no background sounds.

So effectively they are using the massive power of the human brain to do a lot of filtering before the app even starts to analyse the sound.

On top of that, you need to have a large number of reference sounds, which you need to pre compile into a sound signature.

The arduino would need to perform FFT on the sound file, which it is do able, but as the normal Arduino only runs at 16MHz, its going to take it a lot lot longer than the 1GHz CPUs in smart phones.

For a general purpose bird call identifier a large number of bird calls need to be profiled, and they are to easy to source unless you are in that industry.
I.e you can buy sounds of major species for personal use, but there are loads of variants of each call that will not be included.

Also I was told my expert that juvenile birds make different sounds to adults, male and females can make different sounds, mating calls are different, and then there is general variability from bird to bird.

To be honest, I'd be surprised if the App that is on sale, works that well, despite its 5 star reviews;-)

I do not have access to the full paper, but it would appear to be possible on an 8-bit uC:

I submit that it does not appear to be possible with a "toy class" speech recognition chip because if it were, someone would have already released a product that does EXACTLY what the OP wants , to sell to the many bird watchers in the world. If you trully believe that a toy class , 20 phrase chip can recognize bird calls , then I submit you should do just that. I am sure you can get investment capitol from the bird watcher community because for reasons I can't put my finger on, I believe many of them are quite wealthy and have money to burn.

I'll contribute my two cents toward the design.
The key to making this work, is less about the recognition than it is about audio surveilance and noise cancelling. You need a spi-class audio surveilance parabola package as the input device, and you need to add to that a quality noise cancelling system, preferably contracted to BOSE for that part, that eliminates all the OTHER noises that may be present in a bird call scenario. Once you have a clean amplified , background-noise-cancelled signal, then you can process it for recognition. Without Part-A and Part-B, the recognition (Part-C) would never work. (I want a commision BTW. XD)

Quote
I do not have access to the full paper, but it would appear to be possible on an 8-bit uC:

I submit that it does not appear to be possible with a "toy class" speech recognition chip because if it were, someone would have already released a product that does EXACTLY what the OP wants , to sell to the many bird watchers in the world. If you trully believe that a toy class , 20 phrase chip can recognize bird calls , then I submit you should do just that.

Ummmmm. Me thinks you totally missed the point:
In 2001, an 8051 uC which is 8-bits was used as part of a speech recognition system. The algorithm was based on template matching and CVSD algorithm with a bit rate of only 16kbps/sec.

20 has nothing to do with anything in "today" terms.

13 years later, we can throw 512MB or more easily on an Arduino Mega2560 so RAM is not an issue anymore. The reference encoding is not an issue and would be done offline and loaded into flash or eeprom and then to RAM during initialization. Originally Intel 8051 was a 1 MIP (12 MHz clock) device for some instructions and 500K MIPS for doubles instructions. The AVR can be clocked within spec @20MIPS.

Now, I know nothing of the pattern matching algorithm used, so I cannot speculate, but in-ram matching sounds reasonable and do-able.

So, to the Ops question, "...could it be done...?" I believe it probably can. Would I do it, "No" but that is because I prioritize my beer over my hobby and everyone knows:
If you program, don't drink.
If you drink, don't program!

Ray

I have nothing to dispute anything you've said, however my gut feeling tells me it couldn't be trivial of everyone would have one, if for no other reason than to be able to say, "I wonder what kind of birds are around here, let me go check it out with my birdrecapp ....

"I wonder what kind of birds are around here, let me go check it out with my birdrecapp ....

In this context, I must agree that there is an air of humor in the concept! XD

Ray

No one questions the existance of Voice Identity but can you post a link to any company that markets a hand held standalpne bird call recognition product ? Why isn't there an iPhone App for that ?
Can you answer that ?

raschemmel:
No one questions the existance of Voice Identity but can you post a link to any company that markets a hand held standalpne bird call recognition product ? Why isn't there an iPhone App for that ?
Can you answer that ?

Birdcalls can differ throughout the day, among groups just miles apart, and by individual birds.
“When a bird sings, the song itself may have varying amplitudes and frequencies,” Berres says. “It can also speed up a little bit and slow down a little bit. They may throw in a note here or take out a
note there.”
WeBIRD dices songs into time-ordered chunks, using data-organization techniques often applied by geneticists to jumbled bits of DNA to “align temporally misaligned data, working around a lot of the variation,” says Berres.

Well there you go. Do you still think it's doable with 8-bits?
Even the iPhone couldn't do it in standalone mode. All it does is collect the sample and send it to a server for processing by who knows what kind of supercomputer it doesn't operate in Standalone mode without a cell phone signal. It's doable, but requires more resources than you can hold in your hand.

raschemmel:
Well there you go. Do you still think it's doable with 8-bits?
Even the iPhone couldn't do it in standalone mode. All it does is collect the sample and send it to a server for processing by who knows what kind of supercomputer it doesn't operate in Standalone mode without a cell phone signal. It's doable, but requires more resources than you can hold in your hand.

GeeWhiz... You asked for an iPhone app, I gave you the link. Now you are unhappy the iPhone uses back end processing.

Yes, I still think it can be accomplished. Here's is another example:
http://www.researchgate.net/publication/221573753_Microcontroller_implementation_of_melody_recognition_a_prototype

Microcontroller implementation of melody recognition: a prototype.

Jyh-Shing Roger Jang, Yung-Sen Jang
In proceeding of: Proceedings of the Eleventh ACM International Conference on Multimedia, Berkeley, CA, USA, November 2-8, 2003
Source: DBLP
ABSTRACT This demo presents a 16-bit microcontroller implementation of a content-based music retrieval system that can take a user's acoustic input (5-second clip of singing or humming) and then retrieve the intended song from 20 candidate songs. Performance evaluation based on 192 clips shows that the system has a satisfactory top-1 recognition rate of 92%. This system demonstrates the feasibility of microcontroller based melody recognition for music retrieval, which can be used in consumer electronics such as melody-activated interactive toys, query engines for MP3 players or karaoke machines, and so on.

If you pull the PDF, you will note the uC only has 1K RAM. Each song pattern was 256Bytes. The algorithm sampled at 8KHz and the sample duration was 5 seconds. AD resolution was 8-bits.

In the above, 8-bit vs 16-bit is not of significant concern, unless the algorithm is totally poo-poo in 8-bit implementation, which is doubtful. The low RAM requirements are particularly interesting, half that of the Atmega328...

So, repeating, Yes I do believe it is do-able. The secret would be to build from prior successful algorithms and research.

Ray

Edit

This paper presents practical issues and considerations when implementing melody recognition on 8-bit and 16-bit microcontrollers. The underlying melody recognition system (also known as query-by-singing/humming system) allows the user to sing or hum a segment of a melody to the microphone and the system can retrieve the intended song in a timely manner. Performance evaluation based on 192 clips shows that the system has a satisfactory top-1 recognition rate of 92% out of 20 candidate songs in the database. This system demonstrates the feasibility of microcontroller based melody recognition for music retrieval, which can be used extensively in consumer electronics such as melody-activated interactive toys, query engines for MP3/VCD/DVD players and karaoke machines, and so on.

Face recognition using 8-bit uC???

We created a standalone face recognition system for access control. Users enroll in the system with the push of a button and can then log in with a different button. Face recognition uses an eigenface method. Initial testing indicates an 88% successful login rate with no false positives.

DSP on 8-bit uC?

One of the most demanding applications for fast arithmetic is digital flitering. Atmel application note AVR201 shows how to use the hardware multiplier to make a multiply-and-accumulate operation (MAC).

AVR Speech Playback?

Memory is limited on small controllers, so I wrote compressed speech playback for the AVR architecture.

ok. you win...It appears to be doable with a uC. .... XD