For my daughters roboclock project I implemented text to speech.. sort of by pre-rendering the speech for text and phrases into .wav files stored on an SD card. Works really well, is clear and simple. Also, because of the way it works very little text is used in the main sketch.
I have a program that renders speech for all numbers 0 - 1000 along with all minutes of the day and all other phrases needed. I was thinking I could record a lot less numbers and piece together an assortment of numbers to make any number I want. At the moment I piece together numbers to make [minus][number][point][decimal1][decimal2] etc.
The main project is on a Due which IMO kicks butt with being 32 bit, nice amount of memory and not too shabby CPU speed.
:o However I would really like to be able to read quotes etc off the SD and play them. ie "Good morning! How are you today?" I have a dictionary with ~38000 words in it and was thinking to render every word and create a binary tree that's readable off SD card to find the correct file for any word.
Wondering if anyone else has any other ideas on how to tackle this. Because of the volume of entries I doubt I could keep the entire index in memory... but may be able to keep 1/100 in memory which would allow almost instant access to narrowing it down to 100.
Maybe pre-store the most commonly used 1000-2000 words in RAM and then the rest on an SD file?
So a phrase such as "I went to the shop" would already have I, went, to and the in RAM...leaving only "shop" to look up on slow storage.
You could then read the number of characters in the word you are looking for and have the SD file read from a pre-determined line where that x number of characters begins.
Sort them in size order. Alphabetical search would also work.
I know python has some kind of sorting algorithm that is used to sort and to find things...forgotten what it is called as I have little coding experience in python :(.
Shop is in that list as well...so you can tick them all as being in RAM.
Oh and sorry but yes, have the 1000 words in RAM with 2 bytes after to refer to an address on a file on the SD card which has the 1000 pre-rendered.
Eg. byte array[6000]={'A','P','P','L','E',0,1}
So when a search along the byte array comes across APPLE - > Return Address ->Look on SD file for that address -> Return the pre-rendered clip for that word.
You could offload the LUT to EEPROM on an external IC...lots more words...32KB worth of space...average maybe 7bytes per word (5 for letters, 2 for address) = 4500 ish words addressed.
Hi, thanks for the input so far. I managed to render the 58K odd words and oddly... it took 5 minutes to render and > 1 hour to copy to the SD card cause so many files.
I have some very old code (ie mid 80's) that implements a Btree index that seems to work well. It's been a long time since I've used it but I'm able to read 1 word at a time and index it. At the moment each word just gets a +1 on the index so essentially 1st file is "w1.wav" next is "w2.wav" etc. The program that creates the .wav files trims silence off the end as this gives me extra time to find the next word before it needs to be played.
Another approach is to convert the text -> arpabet and then "mash" words together on the fly. This would require a lot less space and would hopefully make it easier to learn new words.
example arpabet :-
ABARE AA0 B AA1 R IY0
ABASCAL AE1 B AH0 S K AH0 L
ABASH AH0 B AE1 SH
ABASHED AH0 B AE1 SH T
ABASIA AH0 B EY1 ZH Y AH0
At the moment I need to test what I have... port the Btree functions to work with SD card library and see how it runs. Main problem I would have with the arpabet is generating the wav files for "AA0" "IY0" "EY1" etc.
I also have no idea how the SD library performs when there's 58K files in a single directory.
Been busy experimenting with the 44 English phonemes and I can put them together and get ... something that could potentially sound like speech. I think the main problem I have atm is "cico" aka crap in = crap out. My sound sources are not really suited to the job at the moment and the words don't flow even though the wav files are trimmed so no leading or trailing silence.
Regardless, I had a good idea about the binary search index that should be simple and fast.
:o The dictionary has 58,000+ words ranging from single letters to at least one of 22 letters. Rather than having 1 array with all and requiring either fixed length or an index to each variable length string I would have 22 arrays (one for each text size) which means fixed length searching.
I've discovered very nice sized SPI FLASH memory modules with 8,16,32 and even 64 Megabits which from what I can see should be more than adequate for storing and retrieving data. (They're well suited for 3.3v and some have 100Mhz+ clock speed... the ones I looked at are all SIOC-8 packages but I ordered some SIOC-8 breakout boards to cure that)
On a side note... I noticed 58,000+ wav files took a long time to copy to SD so might look at merging them into several larger files and removing all the redundant header information and make the indexes point to an offset into particular files. I have a feeling the SD card library would have an easier time looking for an offset than a file name from 58,000+ files as well I'll have to change my PCM library to play a particular file at an offset for a fixed length
<length 1><wave info 1><length 2><wave info 2>....
The best part will be... creating everything will be as simple as pushing a button to render the files and transferring a directory to a SD card. An optional function to recreate the EEPROM index from the SD card would also be included.