You will get a lousy quality, slightly better than two tin cans and a string.
The Arduino has an ADC that is 10 bit in the range 0-5V, so you need to amplify the mike a bit. The sample frequency is max 10K, in practice you will get maybe 3-5000 samples of 2 bytes.
Not measured, but estimate: 5Kbyte per second = 2500 samples of 2 bytes or 5000 samples of one byte (8 bit)
Arduino can do pwm so not even real DAC, suppose it could
So the quality becomes less than an old analog phoneline.
Think you need to consider another architecture. Use the Arduino to control who communicates with who based upon the keypresses but use analog lines for the sound.
What is the distance between the nodes.