I don't think FFT is going to help you... I've seen BPM meters built into DJ mixers and DJ software. I don't know how these work, but you might be able to find a "BPM" or "beat dector" algorithm somewhere. But, I doubt there is any software that's as good as a human.
I'll tell you what I've done and make some suggestions for improving-on what I've done...
I've done some crude beat detection for lighting effects, basically triggering off the "loudness" of the signal. In this application, I find that it's more interesting if the lights respond to the changes in the actual music, rather than just blinking exactly to the "boring" 1-2-3-4 beat.
1. My signal runs through a [u]Peak Detector[/u] with a decay time of somewhere between 01. and 0.3 seconds.
2. I take several readings to get a peak and "average". In my particular application (which has several different lighting-effect modes), my main loop takes a reading once per second and saves the last 20 readings (20 seconds) in an array. Once per second, I take a reading and find the new peak and the new average in the array. Since there is an analog peak detector it's an "average of peaks" rather than a true average. Likewise, the peak in the array isn't necessarily a true-peak... It's just the biggest value in my array.
3. I get the maximum from the array, and that becomes my starting peak/beat reference. (For beat detection, you can actualy skip the whole array thing and pick any reading for your reference, because it's going to self-adjust anyway.)
4. I wait about 1/4 of a second, and then read the input, looking for a level that's greater or equal to my reference. I also start reducing the reference, in case the next beat is not quiet as strong as the previous one. When the input matches (or exceeds) the reference, I've found a beat.
5. I save the new input-trigger level as my new reference and start-over with the ~1/4 second delay before looking for the next beat. (In this mode, I am no longer using my 20-second array.)
Some suggestions for improvement...
**1.**You can try a low-pass filter (hardware). This may or may-not help, depending on the type of music... The beat/rhythm is not always dominated by bass.
2. Some kind of filter for the loudness-envelope (in hardware or software) That is... You don't want to follow all of the "waves" that make-up the audio, but rather you want to follow the ups & downs of the moment-to-moment loudness. For example, if you open an audio file in an audio editor and look at a view for few seconds of audio, you'll see the envelope. If you zoom-in to look at a few miliseconds of audio, you will see the detailed waveform and perhaps the individual samples.
So, you might have a ~100Hz audio filter to filter -out everything except the bass, plus a ~1/2Hz bandpass envelope filter to "tune-in" the beat.
The idea would be to "tune" the filter to magnify (or pass) envelope changes near the expected beat frequency (i.e. a few beats per second) and reduce (or block) envelope changes that are too-fast or too-slow for any reasonable beat.
The peak detector circuit actually makes it impossible to follow detailed the audio waveform, and I think you could make a modification to slow-down the attack to "tune" the detector. (The decay is already slow).
Following the loudness envelope, rather than the actual waveform (and possible doing FFT analysis), means that there is much less demand on the processor... You can take several ADC readings per second instead of many-thousands of readings per second.
3. Some sort of averaging or "fuzzy logic" to keep track of the on-going beat. For example, if you've been getting 1, 2, 3, 4... 1,2,3,4... and you get 1... 3, 4..., You'd probably want to fill-in the "missing' (or weak) beat, just as a human would do.