Speech recognization with esp32

jhgtflifbvf · October 17, 2024, 3:06pm

Subject: Help Needed with ESP32 Audio Transcription to Deepgram

Hello Arduino Community,

I'm working on a voice recognition project using an ESP32 and an INMP441 microphone, where I intend to record audio, store it on an SD card, and then send it to Deepgram for transcription.

Project Overview:

Microcontroller: ESP32
Microphone: INMP441 (I2S)
Audio Storage: SD card
Transcription Service: Deepgram
Sample Rate: 16 kHz

Issues:

I have verified the Wi-Fi connection and can connect to Deepgram successfully. However, the transcription is not happening. I don't receive any HTTP response codes (200, -1, 400, etc.) when sending the audio.

Complete Code:

cpp

#include <driver/i2s.h>
#include <SPI.h>
#include <SD.h>
#include <WiFi.h>
#include <WiFiClientSecure.h>

const char* wifi_name = "iot";
const char* wifi_password = "12345678";

const int CS_PIN = 5;

const int chunk_size = 1024;
uint8_t buffer[chunk_size];

const char* server_url = "api.deepgram.com";
const char* apikey = "my key";
const int server_port = 443;

String transcription;
String response;

WiFiClientSecure client;

i2s_config_t i2s_config;
i2s_pin_config_t i2s_pin_config;

File audio_file;

long start_time;

void setup() {
    Serial.begin(115200);
    // Initializing the WiFi 
    WiFi.begin(wifi_name, wifi_password);
    Serial.println("Connecting to WiFi...");
    while (WiFi.status() != WL_CONNECTED) {
        Serial.print(".");
        delay(500);
    }
    Serial.println("\nConnected to WiFi");
    Serial.print("IP ADDRESS IS: ");
    Serial.println(WiFi.localIP());

    // Initializing the SPI bus and SD card module
    SPI.begin(18, 19, 23, CS_PIN);
    if (!SD.begin(CS_PIN)) {
        Serial.println("Failed to initialize SD card");
        return;
    }
    Serial.println("SD card initialization completed");

    // Initializing the I2S mic (INMP441)
    i2s_config.mode = (i2s_mode_t)(I2S_MODE_MASTER | I2S_MODE_RX);
    i2s_config.sample_rate = 16000;
    i2s_config.bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT;
    i2s_config.channel_format = I2S_CHANNEL_FMT_ONLY_LEFT;
    i2s_config.communication_format = (i2s_comm_format_t)(I2S_COMM_FORMAT_I2S | I2S_COMM_FORMAT_I2S_MSB);
    i2s_config.fixed_mclk = 0;
    i2s_config.intr_alloc_flags = ESP_INTR_FLAG_LEVEL1;
    i2s_config.dma_buf_count = 8;
    i2s_config.dma_buf_len = 64;
    i2s_config.use_apll = false;
    i2s_config.tx_desc_auto_clear = true;

    // Initializing the pin configuration of INMP441
    i2s_pin_config.bck_io_num = 33;
    i2s_pin_config.ws_io_num = 22;
    i2s_pin_config.data_in_num = 35;
    i2s_pin_config.data_out_num = I2S_PIN_NO_CHANGE;

    // Installing the drivers and setting up I2S
    i2s_driver_install(I2S_NUM_0, &i2s_config, 0, NULL);
    i2s_set_pin(I2S_NUM_0, &i2s_pin_config);
    i2s_zero_dma_buffer(I2S_NUM_0);
}

void loop() {
    record();
    http_request();
    deepgram_transcription();
    Serial.println("Finally success..............");
    delay(5000);
    Serial.println("Before sending next data");
}

void record() {
    Serial.println("Starting to record...");
    start_time = millis();
    size_t byte_read;
    int16_t audio_data[1024];

    File audio_file = SD.open("/test.wav", FILE_WRITE);
    if (!audio_file) {
        Serial.println("Failed to open file for writing");
        return;
    }
    
    while ((millis() - start_time) <= 2500) {
        i2s_read(I2S_NUM_0, (void*)audio_data, sizeof(audio_data), &byte_read, portMAX_DELAY);
        audio_file.write((uint8_t*)audio_data, byte_read);
    }
    audio_file.close();
    Serial.println("Audio recording COMPLETED.");
}

void http_request() {
    client.setInsecure();
    if (client.connect(server_url, server_port)) {
        Serial.println("Connected to Deepgram ");
        
        client.println("POST /v1/listen?language=en&model=nova-2 HTTP/1.1");
        client.println("Host: api.deepgram.com");
        client.print("Authorization: Token apikey ");
        client.println("Content-Type: audio/raw");
        client.println("Transfer-Encoding: chunked");
        client.println("Connection: close");
        client.println();

        audio_file = SD.open("/test.wav");
        if (!audio_file) {
            Serial.println("Failed to open the file");
            return;
        }

        while (audio_file.available()) {
            size_t bytes_read = audio_file.read(buffer, sizeof(buffer));
            client.write(buffer, bytes_read);
        }

        client.println("0\r\n\r\n"); // End of chunked encoding
        audio_file.close();
    } else {
        Serial.println("Failed to connect to Deepgram");
    }
}

void deepgram_transcription() {
    while (client.connected()) {
        response = client.readStringUntil('\n');
        if (response == "\r") {
            break;
        }
        Serial.println("Server response: ");
        Serial.print(response);
    }

    if (client.available()) {
        transcription = client.readString();
        Serial.println("TRANSCRIPTION: ");
        Serial.print(transcription);
    }
    client.stop();
}

Additional Context:

I have verified that the audio file is being recorded correctly.
I am not getting any response or error codes from Deepgram when sending the audio.
I would appreciate any guidance on how to troubleshoot this issue or any suggestions for what might be going wrong in the code.

Thank you for your help!

jremington · October 17, 2024, 3:13pm

Have you ever been able to send any file to Deepgram and receive a response?

jhgtflifbvf · October 17, 2024, 3:15pm

No not by deepgram but with wit.ai with esp8266 and also max4466 but i had a time limitation. so i moved to esp32 , inmp441 and deepgram for better result and its not working.

jremington · October 17, 2024, 3:16pm

Obviously, you need to verify that you can send a file to Deepgram.

Set aside the present code, and get the file transmission working using whatever examples Deepgram provides, and make sure that you understand the working example in complete detail.

jhgtflifbvf · October 17, 2024, 3:22pm

i have tried every thing and its a trusted site i wasnt getting the responce so if you can help me with the code or something.

jremington · October 17, 2024, 3:27pm

If you can't get the Deepgram examples working, ask Deepgram for help.

pert · October 17, 2024, 8:14pm

I have deleted your other cross-post @jhgtflifbvf.

Cross-posting is against the Arduino forum rules. The reason is that duplicate posts can waste the time of the people trying to help. Someone might spend a lot of time investigating and writing a detailed answer on one topic, without knowing that someone else already did the same in the other topic.

Repeated cross-posting can result in a suspension from the forum.

In the future, please only create one topic for each distinct subject matter. This is basic forum etiquette, as explained in the "How to get the best out of this forum" guide. It contains a lot of other useful information. Please read it.

Thanks in advance for your cooperation.

jhgtflifbvf · October 18, 2024, 9:30am

Thank you brother, it looks like now i can pull up a solution.

system · April 16, 2025, 9:31am

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ESP voice recognition General Guidance	5	91	January 13, 2026
Speech to text in Arduino Audio	30	1953	August 5, 2025
Convert speech to text with esp32 i2s imp411 General Guidance	31	666	October 15, 2025
ESP_SR works ... sometimes Networking, Protocols, and Devices	1	113	October 13, 2025
I2S example needed General Guidance	4	325	January 8, 2026

Speech recognization with esp32

Project Overview:

Issues:

Complete Code:

Additional Context:

Related topics