I am building an animatronic head that appears to talk using solenoid pneumatic valves similar to that of characters from Chuck E. Cheese's restaurant show. My research says in the early days, before computers, that they used high-frequency tones embedded into the soundtracks. These tones would trigger the valves that moved the joints. Could a similar system be made using a mega2560 and an mp3 file?
I would like to play the audio track from an SD card with a DFplayer into a speaker and the mega and have it detect various frequencies to control multiple valves.
You could, I guess, but that seems like a cumbersome way to control it. The reason they used those methods before is that they didn't have anything better at a reasonable cost. This is 2021, things are much different. You can have an MP3 and a corresponding position file on the same SD card and that would be much simpler to implement.
Also, MP3 filters frequencies to reduce file size IIRC.
Links? No, I can say how I'd approach the problem, but I don't know of anyone who's done it before.
It would basically be timing when I wanted a certain action to happen while the song was playing, and build a file with those timestamps and what the action was. Then code would read the file into memory and start the action list and MP3 playback at the same time.
If you only needed mono voice the other channel of stereo would be available for control.
You would have 16 different commands you could encode on the audio track using DTMF and use an Arduino to decode them and enable digital outputs or whatever.
I can see how the early system was attractive for synchronizing actions to voice.
back when I was in industrial robotics tech school, we did a Halloween display that used a cassette player mechanically coupled to a rotating drum with pegs and micro switches. As the tape played the drum turned and activated the solenoids made the display move.
There is an interesting article here: Animating Dialog
that demonstrates that animators can use 8 different mouth shapes to simulate speaking. If you assigned a DTMF tone pair to each mouth shape, you could record that tone pair on audio( track 2) synchronized with the speech (track 1). The length of the tone pair would enable via a digital output to the appropriate solenoid, each mouth shape, for the amount of time necessary to mimic speaking.
Since that would be an octave, you could use a piano type keyboard to play the face shapes along with the speech track.
How many motions would you want in e.g., a 4 minute song?
If you had one potential motion per second, that would be 240 entries in a table. I don't know how many valves you need to control, but at one byte per entry, you could control 8 valves with 240 motions in a single song.
The final head will be a storytelling robot ball. The eyes use 2 servos to move from side to side independently, and doors that simulate sideways eyelids use 2 servos. the mouth uses 3 lines of 6 neo-pixel modules. Also, the eyeballs have 9 LEDs that blink in various sequences. there is also a 2-axis neck using 2 servos.
Due to the nature of the project, it must be able to be updated with new stories easily. Why I wanted to use a single SD card to have audio and movements programmed.
It looks like you can do DTMF decoding on a regular Uno/Nano but I'm not sure if it will have time left over to do anything else. An ESP32 should be able to handle it with ease, though.
If you have to use an Uno and doing it onboard using DSP won't work well, the canonical DTMF decoder IC was the SSI202 back in the day. You can probably find some on eBay (I may even have a couple in my "telephone stuff" drawer). Just bear in mind that these analog approaches are limited. e.g., DTMF only encodes the tones on a telephone keypad so you are limited to 12 choices, as opposed to digital solutions which have far more "bandwidth."