It won't work because:-
1) The arduino's A/D is only 10 bits and that is not very good quality
2) The arduino has only an 8 bit D/A through the PWM so that is even worse.
3) There is not much memory space to buffer any sound
4) The Xbee transmits data in packets not continuously so you would have to implement a system of time stamped packets, transfer them into a buffer and feed out the buffer at the sample rate. See 3)
5) The Xbee is a bit slow for this sort of thing. It has a top raw speed of 250K baud which translates to 25K bytes per second. With the handshaking this cuts the speed down even further so you would only get a sample rate in the order of 8K samples per second.
Are there any alternatives I could look into?
I bought a wireless phone for my land line and that acts as a wireless intercom, so no need to plug it into a land line. It is a Logik and was cheaper the three arduinos and three Xbees with shields.