it would have to respond to just one loud bang instead of to continuous music
Well, project is merely a concept: measure phase -> calculate time difference -> calculate direction of the source. Localization of the loud bang differs only in last step, filtering / calculation direction.
For short impulses, less data available , the same time there is no need for "quasi real time" speed, so more math could be implemented to process received data pool, and consequently get even better accuracy. All depends, on how fast someone needs a result and what computational power is available.