How can I use a DFPlayer Mini to play speech and control a servo to move an animatronic mouth based on audio amplitude?

I want to use a DFplayer mini to play back speech but I also would like to sample the audio using one of the ADC channels of the Arduino to crudely read amplitude and then drive a servo to open and close a animatronic mouth accordingly.
I am using elechouse v3 voice recognition module with Arduino Uno.

do you need voice recognition? if you only want to capture the amplitude of the signal, a small microphone next to the speakers for example would do that.

If you drive the mouth based on amplitude then it won't look right. The mouth won't match with the words at all. How your mouth is shaped is a function of what syllable you are saying, not how loud you are talking.

Is this the same project as Realistic Animatronic Robot Head That Can Talk

Yes, making this mechanism for my project. But I think this Voice interaction can be solved in this section

So here is the 3d printed head I am using.
Syntex Edition Inmoov Head

In this there is very simple jaw mechanism, and I want only to open/close this according to the Sound output. Like if the sound is loud, the more jaw opens and if the sound is low, then the jaw will open less. Just like according to the amplitude.
According to your suggestion to match the movement with words. I don't have knowledge about that, so your guidance can help me .
I need any of above thing working

I want such a circuit/mechanism that if someone speak his name like " hey robot or hello robot" then robot should wakeup and interact with that person. During this interaction, the jaw mechanism (mouth) of robot must open/close according to the response or output.
I am using elechouse v3 voice recognition and DF Player Mini for Audio output. But how can I sync mouth movements with the audio playback.

Since you use canned audio playback (present on the SD card of the DF Player Mini) you could have for each audio file, stored somewhere to be decided, a representation of the associated animation when that sound is played.

I'd discussed an idea in the past in another thread which basically was to use a stereo MP3 player (not the DF Player Mini).

In a stereo MP3 audio file, each track of the stereo sound is encoded as a separate channel (a left channel and a right channel). You could merge the two channel tracks to get mono music, and you have a completely empty analog track to encode commands.

The analog encoding of the commands is done before generating the MP3, and all techniques are possible to make it detectable, such as amplitude modulation or frequency modulation, but they are complicated to generate and decode.

A simple idea to implement would be to store DTMF audio (what touch tone phone do) on this audio track and use an MT8870 on the Arduino to decode the commands.

To generate the sound of these DTMF commands, you can use on line tools (there are web site which will take a command in the form of a sequence of DTMF codes and create an 8 kHz .wav audio file (output level -1dBFS, 100ms per character, and 1 second for a space).

Once all the audio files are generated, you import them by placing them in the correct spots on the empty track of your MP3, and you have your MP3 audio file carrying the commands.

You will then need an audio device that outputs the right and left tracks separately, one going to the Arduino for DTMF decoding (the one with the commands), and the other going to a speaker.

For decoding DTMF commands, the audio stream would arrive at a magic component that works very well: the MT8870, which can be found as a ready-to-use PCB with a jack connector.

The DTMF commands could represent orders to various actuators, you would just need to define a command language and then encode the orders on the audio track âžś You would have some prep work for the audio files, but then everything is self contained and manually fine tuned.

@royalsaad370 see my post in your other discussion here:

Thank you for this detailed info

Ok I will

Is your control strategy for the mouth going to be that the louder the sound, the further open the mouth should be? And if not, what is your strategy?

Because that isn’t how human speech works. You close your mouth to make sounds like “p” and “b” and you open it for sounds like “ah”. Loudness doesn’t have that much to do with it. How “realistic” does the mouth have to be?

I don't think @royalsaad370 knows. Despite several attempts to discover what the actual project brief was from their instructor, all I can gather is that the project needs to be completed in 2 months.

I put forward the rather basic (and with little knowledge of how speech is formed) suggestion that they may be able to connect the output from the DFPlayer to one of the ADC channels and potentially drive a servo based on the amplitude of the sound.

The OP has chosen an existing design for their animatronic head, which I think incorporates a single servo to control the mouth opening.

Beyond that, very few details have been forthcoming.

If you're playing an MP3 then why not encode the servo movements in sub-audio? That's what I would do. Then you could pick the MP3 up at any point and the servo would know where to go.

This might have been what you meant, but you could wire up someone’s jaw to a linear measuring device as they spoke the audio, and record the movement as sub-audio with which to control the servo on playback.

1 Like

Even better.

I predict that a random number generator gets used. :smiley:

Big Mouth Billy Bass springs to mind...

Here are the instructions from our instructor:
Make a robot Head that can mimic basic facial expressions (eye movements and mouth movements etc.)
It also must locate the direction of sound and rotate head in that direction but only when someone want to talk with him. Like if someone uses a wakeup command.
Robot should be able to interact with people via voice recognition. (Only some basic commands and Responses are enough)
Robot listen the voice command and generate a response (Audio output with speaker) but also move his mouth (open/close) . The output response can be either prerecorded or AI based. But I think prerecorded response is easier if I am not wrong.

So here are the requirements

Ok so that's not so bad. Don't overcomplicate it more than it needs to be.

Mouth can just move randomly whenever one of the responses is being played. No-one says it has to realistically mimic human speech patterns, which would be very tricky.

You will have to identify the direction of the person speaking. You could then simply:

  1. move eyes first in the direction of the speaker
  2. rotate head in the direction of speaker
  3. once head has reached end of travel, straighten eyes again

That would meet the requirements.

It sounds like the speech recognition will be the most sophisticated part. The pre-recorded responses would not be hard and could be done in one of the Arduinos that can play audio from an SD card, as I suggested in your other thread. No need for external MP3 players or suchlike.