It's very possible to do what you want, and it isn't very difficult.
With regards to latency, the only delays will be the signal propagation down the wires and how many clock cycles your code requires to detect the flash and then output to a speaker. Even with very inefficient code, their probably won't be a noticable delay (to human) between the detection and sound.
You should start by looking the Playground's example code for
light sensors and
audio output devices. If you want to do something else with the sketch while waiting for a flash, you should also read about interrupts in the
Reference section of the main website.