Dear all
if I transmitted a text over voice using talkie.h library and voice.say() function
What library and what function should I use to recover the received speech signal to hex byte array?
Dear all
if I transmitted a text over voice using talkie.h library and voice.say() function
What library and what function should I use to recover the received speech signal to hex byte array?
You will need a PC to perform the speech to text function.
The Talkie library uses the LPC (Linear Predictive Coding) method to turn data bytes into speech, which is very simple compared to the reverse operation.
On top of that, the Talkie library uses speech that has already been converted to LPC data on more powerful hardware.
Then again, I've wondered what it would take to generate LPC coefficients from phonemes in real time on an Arduino, and then pass those coefficients to Talkie to do real speech synthesis. The old-school Speak & Spell did it in the late '70s, so it might just be possible on Arduino. I know speech synthesis is possible on Arduino (and even on the ATTiny85) but I think it would be interesting to use Talkie for possibly higher quality.
Not possible using any Arduino, but a fast PC running Matlab or an open source equivalent like Gnu Octave can do this in somewhat less than real time. See the "freemat" source code in Peter Knight's original Talkie library. It works more or less out-of-the box with Matlab.
Conceivably, if that code were compiled into C using Matlab, it could execute in real time.
There is also Python Wizard
I meant more like calculating the LPC coefficients from a table (one set of coefficients for each phoneme) and then interpolating between them and such (e.g., to make diphthongs like /ai/). Talkie's converter takes a WAV file and analyzes it and does a lot of math (IIRC, it uses cross-correlation to find the fundamental frequency from the glottal buzz) to figure out the LPC coefficients, which is much heavier work than interpolating between coefficients from a lookup table.
My own speech synthesizer has a table of formant frequencies for each phoneme; my idea is more or less to use precomputed LPC coefficients in place of formant frequencies. And like I said, this kind of thing has already been done on the Speak & Spell.
Edit: actually I was wrong about the Speak & Spell. It uses a small lookup table of pre-recorded words. How it Works | Bringing Back THE VOICE of Speak & Spell | Adafruit Learning System. The "TI99-4/A Terminal Emulator II module used a set of LPC-encoded phonemes" according to that page.
I'm not sure what your point is. The Speak & Spell uses precalculated LPC coefficients stored in ROM, to pronounce a limited set of words.
This site describes a complete S&S circuit analysis, breakdown of ROM contents and shows how to make a Speak & Spell add-on cartridge with your own collection of words: Furrtek.org : La dict�e magique (Speak and Spell)
BTW in my opinion, Peter Knight's Matlab code does a significantly better job of producing LPC coefficients for intelligible speech than anything else available on line.
I guess my point is that it should be possible to develop an LPC-based text-to-speech synthesizer on Arduino using Talkie. I was wrong about the Speak & Spell having a text-to-speech synthesizer, but the TI99-4/A Terminal Emulator II module (which uses the TMS5220 speech chip that is emulated by Talkie) does have a text-to-speech synthesizer that works essentially the same way that I'm suggesting. Surely this early '80s personal computer technology can't be outside the capabilities of Arduino!
Generating LPC coefficients from phonemes using lookup tables should be fairly simple and has to be done only about 100 times per second (the articulatory time slice in a phoneme-based speech synthesizers is generally around 10 ms), compared to Talkie which has to run 8000 times per second.
Now I'm tempted to try this, if only as a proof of concept.
It would be no problem for a PC to precalculate the LPC coefficients corresponding to the limited set of phonemes. That may be what TI did.
I imagine you're right, and that's what I would do. In my speech synthesizer, I was working directly with the formant frequencies of each phoneme, which are well documented around the web. But LPC coefficients are not as straightforward to compute.
As mentioned previously, the "freemat" LPC code that comes with the original Peter Knight Talkie library works very well (runs on free GNU Octave). Takes a .wav file as input, for example.
This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.