I can't really tell whether these guy's are using multiple speakers on different pins or what - are they? If so, why?Why not a single speaker, and a simple audio mixer?
If tone generation was all you wanted the sketch to do, and you don't need to generate very high frequencies, you might even be able to manage it just by polling micros() to check when it was time to toggle each output. This would use more processing power and not cope with as many outputs/as high frequencies as the interrupt based approach, but would be much easier to understand and get working. Essentially you'd use 'blink without delay' but using micros() instead of millis() to measure really short intervals.
I am using three different speakers on three different pins. What's an audio mixer? (Sorry, I'm new)