Bad audio - using ESP8266 to receive multicast PCM audio and forward to i2s

Hi I am trying to build a multicast audio receiver based on ESP8266 (Adafruit Huzzah breakout). The setup is designed like this

PCM Audio -> Multicast -> Wifi AP -> ESP8266 -> DAC -> Speakers.

#define INBUF_LEN 1500
char incomingPacket[INBUF_LEN];
char audiobuffer[INBUF_LEN];

void swap_bytes(size_t len)
{
  for (int i=sizeof(PA_hdr); i< len ; i +=2){
    audiobuffer[i]=incomingPacket[i+1];
    audiobuffer[i+1]=incomingPacket[i];
  }
}

void loop(){
    size_t packetLength = Udp.parsePacket(); 
    if(packetLength){
        size_t len = Udp.read(incomingPacket, INBUF_LEN);
        if (len > sizeof(PA_hdr)){
          size_t frames_to_send = (len - sizeof(PA_hdr))/4;
          size_t sent = 0;
          swap_bytes(len);
          while (sent < frames_to_send)
          {
            int16_t *p = (int16_t *)(audiobuffer+sizeof(PA_hdr)+sent*2);
            size_t result = i2s_write_buffer_nb(p,frames_to_send-sent);
            sent +=result;
            yield();
          }
        }
    }
    yield();
}

I am testing the setup using burst of 3 seconds of audio data, and then silence for 5s. That way I am able to detect the silence and Serial.print some statistics in the silent period without causing dropouts due to slow Serial.print.

Multicast is sent as 16 bit signed PCM samples (44.1 kHz stereo) in 172 packets per second each containing 1024 bytes audio (=512 samples =256 Frames) the i2s DMA can accept up to 512 samples (or less, in steps of 64)

Known to work:

  • There is no packet loss on the WIFI part. All packets sent during a burst are received and in order. (There is a packet number field on each packet to make sure)
  • The ESP8266 can send perfect audio out to the DAC. Tested using a locally generated sine wave.
  • There is no difference if using i2s_write_buffer or i2s_write_sample nor their non-blocking variants.

However I am not able to get acceptable audio when receiving UDP packets. The sound is stuttering and seems to play slower than the input (i.e getting gradually more delayed compared to the source during the 3 second burst)

Any ideas? Can the wifi packet arrival processing take so long time (in ISR context or after) so that it is not possible to feed the i2s DMA fast enough?

ANY ideas on where to look is much appreciated!
Thanks

I was intentionally not posting in the audio forum since I believe the problem is generic (UDP processing vs i2s) rather than audio specific.

I don't know the answer to your question, but some things I would investigate, at the risk of telling you what you already know :o :

UDP is used for real time bi-directional services, such as telephone speech (VoIP; Voice over IP). The crucial thing being both that it is real time and bi-directional. If what I say to you on the phone is delayed by more than, maybe, 100ms or so, then you are going to notice. This means that speech packets are sent and forgotten, which is what UDP does. If they don't arrive or the arrive late, or they arrive with errors, tough, there is no time to do anything about it, there is no time to ask for them to be re-sent. While I don't know about multi-casting, being one way only it is possible (but I have no idea if done in practice generally) to have some delay between reception and playing to allow for resending data.

I am surprised you are not getting any errors or lost packets. When I first got 2 ESP8266s to play with the first thing I did after working out how to connect them to Wi-Fi and each other was to send test packets between them and log any errors. I definitely got errors and the odd lost packet. Not many, but not 0. I'm sorry I can't remember how many, but maybe 0.1% or something like that.

I think if I were doing what you are doing I'd be investigating if the data really does arrive complete and without errors, and I'd be thinking about buffering the data to ensure it goes to the D2A in steady stream without jitter.

I am assuming the ESP8266 really can deal with the data rates you need (I have no idea!).

I hope somewhere in the above is something helpful, and I hope I've not just repeated everything you already know.

I'll be interested to know how you get on.

Thanks for your reply.

UDP is unidirectional in itself, but may be used bidirectionally. In this application it is going to be unidirectional only. When I get the basics to work i will add a small FIFO buffer to handle some jitter.

Good suggestion: I will add some logic to verify packet content - I have assumed that UDP layer checks the UDP CRC - that may not be the case.

As for data rates we are talking about 44.100 Hz * 2 channels * 2 bytes per sample * 8 bits/ byte = 1 411 200 bits per second.

And of course - I will post any progress made!

Update: Test with sender side (Linux PC) sending packets every ~7,26ms, the receiving side is consistently receiving with ~12 ms average packet gap. Strange! The jitter is low - and conistent

Max gap: 13276 µs Average gap: 11848 us Jitter: 386 us
Max gap: 13114 µs Average gap: 11829 us Jitter: 372 us
Max gap: 13015 µs Average gap: 11827 us Jitter: 359 us
Max gap: 13423 µs Average gap: 11844 us Jitter: 382 us

Current suspicion is that UDP reception is somehow throttling.
There are similar reports: Low UDP iperf throughput - ESP32 Forum

Now have some more tangible results. Have added FIFO buffering and play MONO like a charm. But I want stereo...

I am confident it is a throttling in lower packet layers somehow. Packets do not arrive more often than 12 ms. This explains why the jitter was measured unrealistically low. Packets are buffered in the WIFI/UDP stack and delivered one by one.

When running mono = 1280 bytes of pwm data approx each 14,5 ms it works like a charm.
When running stereo = 1280 bytes of pwm data approx each 7 ms packets are arriving each 11,8 ms.

Jitter is also more realistic:
Max gap: 20713 µs Average gap: 14470 us Jitter: 6351 us
Max gap: 20797 µs Average gap: 14468 us Jitter: 6052 us
Max gap: 20803 µs Average gap: 14471 us Jitter: 5241 us

Anybody that can receive UDP at high speed? I also tested the ESPAsyncUDP with identical results. Therefore suspect the problem is in lower layers.

1 Like