I'd like the Arduino to recognise that "beep____beep____beep____beep____beep" is a pass and "beep_beep_beep_beep_beep" is a fail.
All you know, now, is whether the voltage at the pin is between some lower limit and some upper limit. You need to record when the value goes above the limit and when it goes below the limit, so you can distinguish between beep and beeeeeeeep, and between beep____beep and beepbeep or beep_____________beep.
So, you need to start with two changes. First, you need to record the previous value, so you can detect when the current value is within the thresholds while the previous value was not (the start of the beep), and when the current value is not but the previous value was (the end of the beep).
Second, you need to determine when each change happens, so you know how long a beep or a pause lasts. Whether you use millis() or micros() depends on the duration of a beep or a pause.
Then, you need to recognize that you have no way of knowing what sound the microphone picked up. It may, or may not, have been that of the beeper. THAT is a much more difficult problem to deal with.