Arduino Forum

Community => Exhibition / Gallery => Topic started by: msb4180 on Jan 06, 2016, 04:40 pm

Title: Speech Recognition and Synthesis with Arduino
Post by: msb4180 on Jan 06, 2016, 04:40 pm
In my previous post (https://forum.arduino.cc/index.php?topic=363530.0), I showed how to control a few LEDs using an Arduino board and BitVoicer Server (http://www.bitsophia.com/en-US/BitVoicerServer/Overview.aspx). In this post, I am going to make things a little more complicated. I am also going to synthesize speech using the Arduino DUE (https://www.arduino.cc/en/Main/ArduinoBoardDue) digital-to-analog converter (DAC) (https://en.wikipedia.org/wiki/Digital-to-analog_converter). If you do not have an Arduino DUE, you can use other Arduino boards, but you will need an external DAC and some additional code to operate the DAC (the BVSSpeaker (http://www.bitsophia.com/en-US/BitVoicerServer/v1/Documentation/Manual.aspx?section=723&page=723) library will not help you with that).

(http://ezimport.xpg.uol.com.br/BVS_Demo2/Img1.jpg)


In the video below, you can see that I also make the Arduino play a little song and blink the LEDs as if they were piano keys. Sorry for my piano skills, but that is the best I can do  ??? . The LEDs actually blink in the same sequence and timing as real C, D and E keys, so if you have a piano around you can follow the LEDs and play the same song. It is a jingle from an old retailer (Mappin) that does not even exist anymore.

(http://ezimport.xpg.uol.com.br/BVS_Demo2/YouTubeVideo.jpg) (https://www.youtube.com/watch?v=JAcWsjUPU5M)

https://www.youtube.com/watch?v=JAcWsjUPU5M (https://www.youtube.com/watch?v=JAcWsjUPU5M)


The following procedures will be executed to transform voice commands into LED activity and synthesized speech:



List of Materials:

Title: Re: Speech Recognition and Synthesis with Arduino
Post by: msb4180 on Jan 06, 2016, 04:48 pm
STEP 1: Wiring

The first step is to wire the Arduino and the breadboard with the components as shown in the pictures below. I had to place a small rubber underneath the speaker because it vibrates a lot and without the rubber the quality of the audio is considerably affected.

(http://ezimport.xpg.uol.com.br/BVS_Demo2/FritzingProtoboard.jpg)


(http://ezimport.xpg.uol.com.br/BVS_Demo2/Img2.jpg)


(http://ezimport.xpg.uol.com.br/BVS_Demo2/Img3.jpg)


(http://ezimport.xpg.uol.com.br/BVS_Demo2/Img4.jpg)


Here we have a small but important difference from my previous post (https://forum.arduino.cc/index.php?topic=363530.0). Most Arduino boards run at 5V, but the DUE runs at 3.3V. Because I got better results running the Sparkfun Electret Breakout at 3.3V, I recommend you add a jumper between the 3.3V pin and the AREF pin IF you are using 5V Arduino boards. The DUE already uses a 3.3V analog reference so you do not need a jumper to the AREF pin. In fact, the AREF pin on the DUE is connected to the microcontroller through a resistor bridge. To use the AREF pin, resistor BR1 must be desoldered from the PCB.
Title: Re: Speech Recognition and Synthesis with Arduino
Post by: msb4180 on Jan 06, 2016, 05:05 pm
STEP 2: Uploading the code to the Arduino

Now you have to upload the code below to your Arduino. You can also download the Arduino sketch from the link below the code. Before you upload the code, you must properly install the BitVoicer Server libraries into the Arduino IDE (Importing a .zip Library (https://www.arduino.cc/en/Guide/Libraries#toc4)).
Title: Re: Speech Recognition and Synthesis with Arduino
Post by: msb4180 on Jan 06, 2016, 05:14 pm
Code: [Select]

#include <BVSP.h>
#include <BVSMic.h>
#include <BVSSpeaker.h>
#include <DAC.h>

// Defines the Arduino pin that will be used to capture audio
#define BVSM_AUDIO_INPUT 7

// Defines the LED pins
#define RED_LED_PIN 6
#define YELLOW_LED_PIN 9
#define GREEN_LED_PIN 10

// Defines the constants that will be passed as parameters to
// the BVSP.begin function
const unsigned long STATUS_REQUEST_TIMEOUT = 3000;
const unsigned long STATUS_REQUEST_INTERVAL = 4000;

// Defines the size of the mic audio buffer
const int MIC_BUFFER_SIZE = 64;

// Defines the size of the speaker audio buffer
const int SPEAKER_BUFFER_SIZE = 128;

// Defines the size of the receive buffer
const int RECEIVE_BUFFER_SIZE = 2;

// Initializes a new global instance of the BVSP class
BVSP bvsp = BVSP();

// Initializes a new global instance of the BVSMic class
BVSMic bvsm = BVSMic();

// Initializes a new global instance of the BVSSpeaker class
BVSSpeaker bvss = BVSSpeaker();

// Creates a buffer that will be used to read recorded samples
// from the BVSMic class
byte micBuffer[MIC_BUFFER_SIZE];

// Creates a buffer that will be used to write audio samples
// into the BVSSpeaker class
byte speakerBuffer[SPEAKER_BUFFER_SIZE];

// Creates a buffer that will be used to read the commands sent
// from BitVoicer Server.
// Byte 0 = pin number
// Byte 1 = pin value
byte receiveBuffer[RECEIVE_BUFFER_SIZE];

// These variables are used to control when to play
// "LED Notes". These notes will be played along with
// the song streamed from BitVoicer Server.
bool playLEDNotes = false;
unsigned int playStartTime = 0;

void setup()
{
  // Sets up the pin modes
  pinMode(RED_LED_PIN, OUTPUT);
  pinMode(YELLOW_LED_PIN, OUTPUT);
  pinMode(GREEN_LED_PIN, OUTPUT);

  // Sets the initial state of all LEDs
  digitalWrite(RED_LED_PIN, LOW);
  digitalWrite(YELLOW_LED_PIN, LOW);
  digitalWrite(GREEN_LED_PIN, LOW);
 
  // Starts serial communication at 115200 bps
  Serial.begin(115200);
 
  // Sets the Arduino serial port that will be used for
  // communication, how long it will take before a status request
  // times out and how often status requests should be sent to
  // BitVoicer Server.
  bvsp.begin(Serial, STATUS_REQUEST_TIMEOUT, STATUS_REQUEST_INTERVAL);
   
  // Defines the function that will handle the frameReceived
  // event
  bvsp.frameReceived = BVSP_frameReceived;

  // Sets the function that will handle the modeChanged
  // event
  bvsp.modeChanged = BVSP_modeChanged;
 
  // Sets the function that will handle the streamReceived
  // event
  bvsp.streamReceived = BVSP_streamReceived;
 
  // Prepares the BVSMic class timer
  bvsm.begin();

  // Sets the DAC that will be used by the BVSSpeaker class
  bvss.begin(DAC);
}

void loop()
{
  // Checks if the status request interval has elapsed and if it
  // has, sends a status request to BitVoicer Server
  bvsp.keepAlive();
 
  // Checks if there is data available at the serial port buffer
  // and processes its content according to the specifications
  // of the BitVoicer Server Protocol
  bvsp.receive();

  // Checks if there is one SRE available. If there is one,
  // starts recording.
  if (bvsp.isSREAvailable())
  {
    // If the BVSMic class is not recording, sets up the audio
    // input and starts recording
    if (!bvsm.isRecording)
    {
      bvsm.setAudioInput(BVSM_AUDIO_INPUT, DEFAULT);
      bvsm.startRecording();
    }

    // Checks if the BVSMic class has available samples
    if (bvsm.available)
    {
      // Makes sure the inbound mode is STREAM_MODE before
      // transmitting the stream
      if (bvsp.inboundMode == FRAMED_MODE)
        bvsp.setInboundMode(STREAM_MODE);
       
      // Reads the audio samples from the BVSMic class
      int bytesRead = bvsm.read(micBuffer, MIC_BUFFER_SIZE);
     
      // Sends the audio stream to BitVoicer Server
      bvsp.sendStream(micBuffer, bytesRead);
    }
  }
  else
  {
    // No SRE is available. If the BVSMic class is recording,
    // stops it.
    if (bvsm.isRecording)
      bvsm.stopRecording();
  }

  // Plays all audio samples available in the BVSSpeaker class
  // internal buffer. These samples are written in the
  // BVSP_streamReceived event handler. If no samples are
  // available in the internal buffer, nothing is played.
  bvss.play();

  // If playLEDNotes has been set to true,
  // plays the "LED notes" along with the music.
  if (playLEDNotes)
    playNextLEDNote();
}

// Handles the frameReceived event
void BVSP_frameReceived(byte dataType, int payloadSize)
{
  // Checks if the received frame contains binary data
  // 0x07 = Binary data (byte array)
  if (dataType == DATA_TYPE_BINARY)
  {
    // If 2 bytes were received, process the command.
    if (bvsp.getReceivedBytes(receiveBuffer, RECEIVE_BUFFER_SIZE) ==
      RECEIVE_BUFFER_SIZE)
    {
      analogWrite(receiveBuffer[0], receiveBuffer[1]);
    }
  }
  // Checks if the received frame contains byte data type
  // 0x01 = Byte data type
  else if (dataType == DATA_TYPE_BYTE)
  {   
    // If the received byte value is 255, sets playLEDNotes
    // and marks the current time.
    if (bvsp.getReceivedByte() == 255)
    {
      playLEDNotes = true;
      playStartTime = millis();
    }
  }
}

// Handles the modeChanged event
void BVSP_modeChanged()
{
  // If the outboundMode (Server --> Device) has turned to
  // FRAMED_MODE, no audio stream is supposed to be received.
  // Tells the BVSSpeaker class to finish playing when its
  // internal buffer become empty.
  if (bvsp.outboundMode == FRAMED_MODE)
    bvss.finishPlaying();
}

// Handles the streamReceived event
void BVSP_streamReceived(int size)
{
  // Gets the received stream from the BVSP class
  int bytesRead = bvsp.getReceivedStream(speakerBuffer,
    SPEAKER_BUFFER_SIZE);
   
  // Enqueues the received stream to play
  bvss.enqueue(speakerBuffer, bytesRead);
}

// Lights up the appropriate LED based on the time
// the command to start playing LED notes was received.
// The timings used here are syncronized with the music.
void playNextLEDNote()
{
  // Gets the elapsed time between playStartTime and the
  // current time.
  unsigned long elapsed = millis() - playStartTime;

  // Turns off all LEDs
  allLEDsOff();

  // The last note has been played.
  // Turns off the last LED and stops playing LED notes.
  if (elapsed >= 11500)
  {
    analogWrite(RED_LED_PIN, 0);
    playLEDNotes = false;
  }
  else if (elapsed >= 9900)
    analogWrite(RED_LED_PIN, 255); // C note
  else if (elapsed >= 9370)
    analogWrite(RED_LED_PIN, 255); // C note
  else if (elapsed >= 8900)
    analogWrite(YELLOW_LED_PIN, 255); // D note
  else if (elapsed >= 8610)
    analogWrite(RED_LED_PIN, 255); // C note
  else if (elapsed >= 8230)
    analogWrite(YELLOW_LED_PIN, 255); // D note
  else if (elapsed >= 7970)
    analogWrite(YELLOW_LED_PIN, 255); // D note
  else if (elapsed >= 7470)
    analogWrite(RED_LED_PIN, 255); // C note
  else if (elapsed >= 6760)
    analogWrite(GREEN_LED_PIN, 255); // E note
  else if (elapsed >= 6350)
    analogWrite(RED_LED_PIN, 255); // C note
  else if (elapsed >= 5880)
    analogWrite(YELLOW_LED_PIN, 255); // D note
  else if (elapsed >= 5560)
    analogWrite(RED_LED_PIN, 255); // C note
  else if (elapsed >= 5180)
    analogWrite(YELLOW_LED_PIN, 255); // D note
  else if (elapsed >= 4890)
    analogWrite(YELLOW_LED_PIN, 255); // D note
  else if (elapsed >= 4420)
    analogWrite(RED_LED_PIN, 255); // C note
  else if (elapsed >= 3810)
    analogWrite(GREEN_LED_PIN, 255); // E note
  else if (elapsed >= 3420)
    analogWrite(RED_LED_PIN, 255); // C note
  else if (elapsed >= 2930)
    analogWrite(YELLOW_LED_PIN, 255); // D note
  else if (elapsed >= 2560)
    analogWrite(RED_LED_PIN, 255); // C note
  else if (elapsed >= 2200)
    analogWrite(YELLOW_LED_PIN, 255); // D note
  else if (elapsed >= 1930)
    analogWrite(YELLOW_LED_PIN, 255); // D note
  else if (elapsed >= 1470)
    analogWrite(RED_LED_PIN, 255); // C note
  else if (elapsed >= 1000)
    analogWrite(GREEN_LED_PIN, 255); // E note
}

// Turns off all LEDs.
void allLEDsOff()
{
  analogWrite(RED_LED_PIN, 0);
  analogWrite(YELLOW_LED_PIN, 0);
  analogWrite(GREEN_LED_PIN, 0);
}


BVS_Demo2.ino (http://ezimport.xpg.uol.com.br/BVS_Demo2/BVS_Demo2.ino)
Title: Re: Speech Recognition and Synthesis with Arduino
Post by: msb4180 on Jan 06, 2016, 05:19 pm
This sketch above has seven major parts:

Title: Re: Speech Recognition and Synthesis with Arduino
Post by: msb4180 on Jan 06, 2016, 05:27 pm
STEP 3: Importing BitVoicer Server Solution Objects

Now you have to set up BitVoicer Server to work with the Arduino. BitVoicer Server has four major solution objects: Locations, Devices, BinaryData and Voice Schemas.

Locations represent the physical location where a device is installed. In my case, I created a location called Home.

Devices are the BitVoicer Server clients. I created a Mixed device, named it ArduinoDUE and entered the communication settings. IMPORTANT: even the Arduino DUE has a small amount of memory to store all the audio samples BitVoicer Server will stream. If you do not limit the bandwidth, you would need a much bigger buffer to store the audio. I got some buffer overflows for this reason so I had to limit the Data Rate in the communication settings (http://www.bitsophia.com/en-US/BitVoicerServer/v1/Documentation/Manual.aspx?section=4122&page=4122) to 8000 samples per second.

BinaryData is a type of command BitVoicer Server can send to client devices. They are actually byte arrays you can link to commands. When BitVoicer Server recognizes speech related to that command, it sends the byte array to the target device. I created one BinaryData object to each pin value and named them ArduinoDUEGreenLedOn, ArduinoDUEGreenLedOff and so on. I ended up with 18 BinaryData objects in my solution, so I suggest you download and import the objects from the VoiceSchema.sof file below.

Voice Schemas are where everything comes together. They define what sentences should be recognized and what commands to run. For each sentence, you can define as many commands as you need and the order they will be executed. You can also define delays between commands. That is how I managed to perform the sequence of actions you see in the video.

One of the sentences in my Voice Schema is "play a little song." This sentence contains two commands. The first command sends a byte that indicates the following command is going to be an audio stream. The Arduino then starts "playing" the LEDs while the audio is being transmitted. The audio is a little piano jingle I recorded myself and set it as the audio source of the second command. BitVoicer Server supports only 8-bit mono PCM audio (8000 samples per second) so if you need to convert an audio file to this format, I recommend the following online conversion tool: http://audio.online-convert.com/convert-to-wav (http://audio.online-convert.com/convert-to-wav).

You can import (Importing Solution Objects (http://www.bitsophia.com/en-US/BitVoicerServer/v1/Documentation/Manual.aspx?section=47&page=47)) all solution objects I used in this post from the files below. One contains the DUE Device and the other contains the Voice Schema and its Commands.


Solution Object Files:

VoiceSchema.sof (http://ezimport.xpg.uol.com.br/BVS_Demo2/VoiceSchema.sof)
Device.sof (http://ezimport.xpg.uol.com.br/BVS_Demo2/Device.sof)
Title: Re: Speech Recognition and Synthesis with Arduino
Post by: msb4180 on Jan 06, 2016, 05:38 pm
STEP 4: Conclusion

There you go! You can turn everything on and do the same things shown in the video.

(http://ezimport.xpg.uol.com.br/BVS_Demo2/YouTubeVideo.jpg) (https://www.youtube.com/watch?v=JAcWsjUPU5M)

https://www.youtube.com/watch?v=JAcWsjUPU5M (https://www.youtube.com/watch?v=JAcWsjUPU5M)


As I did in my previous post, I started the speech recognition by enabling the Arduino device in the BitVoicer Server Manager (http://www.bitsophia.com/en-US/BitVoicerServer/v1/Documentation/Manual.aspx?section=4&page=4). As soon as it gets enabled, the Arduino identifies an available Speech Recognition Engine and starts streaming audio to BitVoicer Server. However, now you see a lot more activity in the Arduino RX LED while audio is being streamed from BitVoicer Server to the Arduino.

In my next post, I will be a little more ambitious. I going to add WiFi communication to one Arduino and control two other Arduinos all together by voice. I am thinking of some kind of game between them. Suggestions are very welcome!
Title: Re: Speech Recognition and Synthesis with Arduino
Post by: Fantasan on Oct 11, 2018, 02:01 pm
Good rusalts I like it