Go Down

Topic: Ultra small sound file? (Read 1 time) previous topic - next topic


I'm looking to fit a few seconds of intelligible speech into the progmem of an attiny85.  So far, out of my available 6kb, I've used 1.5 and I'm expecting to have 4-4.5 kb free by the time I'm done. 

8kHz seems like it's the minimum to get human voice harmonics necessary to understand speech but the bit rate can be really low, I don't care.  As long as I can say a few words and have them understood, I just need to fit it into 4.5kB.  So far, I've been able to fit 1.6 seconds of music into a wav file at 8kHz, 4 bit, 33 kbps, but it still sounds too high quality and the size is too big, length not long enough.

Can anyone recommend a program for me to get an even smaller file?  I'm using goldwave and this is the crappiest setting I could find.  I need to use wav so that I can pull HEX from it later.  Thanks.


Mar 18, 2018, 06:53 am Last Edit: Mar 18, 2018, 06:54 am by Grumpy_Mike
You could use four bit samples in place of eight bit ones. This sample would represent the difference between the current output number and the last one. Then when you read the file the four bit sample is added to the running total. Note this four bit number is signed so can represent an up or down change. This is known as delta modulation. It will allow you double the time or half the file size.
For even better compression use just a one bit sample representing an increment or decrement of the running total.


So far, I've been able to fit 1.6 seconds of music into a wav file at 8kHz, 4 bit, 33 kbps, but it still sounds too high quality and the size is too big, length not long enough.
4khz should be enough for intelligible speech, if your 'goldwave' does not let you do this you could output 8kHz 4 bit and process it elsewhere.

You could try recording 1.6 seconds of your speech, low-pass filter this to 1.5kHz, play it as your 8kHz 4bit, then modify the code to only play every 2nd sample. If it sounds good enough, you could get the processor to write out the 'compressed' samples in a suitable comma-separated hex format to the serial port, then paste that in your code.

Depending on speech you are actually wanting to record, it may have separated words - so you could close-clip each word and encode them separately, then add the inter-word delays back in when you play the separate words.



Or you could add a EEPROM, and store the sample data in it! A few projects around to show you how to do this.  :)  Example here
Mrs Drew
http://www.uk-pcb.co.uk - UK PCB Fab Company
I will design & code for you, but I will also charge you (PM me)
If you don't like my answers, realize : I'm not being cheeky, I'm Just trying to prompt you to use your own brain/google etc.


Mar 18, 2018, 06:44 pm Last Edit: Mar 18, 2018, 07:56 pm by MrMark
4khz should be enough for intelligible speech, if your 'goldwave' does not let you do this you could output 8kHz 4 bit and process it elsewhere.
For what it's worth, I played with this a bit using Gnu Octave's signals toolkit and this wav file: http://www.wavsource.com/snds_2018-01-14_3453803176249356/movies/2001/sorry_dave.wav

Process was simply to read in the file (signed floats in range -1 to 1, sample rate 11024 per second), multiply it by 8 and dropping the fractional part (this simulates 4 bit resolution), and decimate by taking every n'th sample (skipping the low pass filter as noted by Tony above) to get a lower sample rate.  The voice of "Dave" was intelligible even down to n=8 -> 1378 samples/second, the "HAL" voice less so by that point, presumably because it is a somewhat higher pitched voice.  Attached file is 4 bit resolution at 2756 samples/second.

Octave code for attached sample:
Code: [Select]
>> [hal,fs] = audioread("sorry_dave.wav") ;  % Read WAV file
>> fs    % Sample rate of file read
fs =  11025
>> L = length(hal) ;   % Number of samples in file
>> hal2756by4 = fix(hal(1:4:L)*8)/8 ;  % Take every fourth sample and limit to 4 bit signed resolution
>> soundsc(hal2756by4,fs/4)  ;  % Play to soundcard
>> audiowrite("/home/mark/Downloads/sorry_dave_4bit_2756sps.wav",hal2756by4,fs/4);  % Write to WAV file


Perhaps 4khz would work for me then.  I haven't been able to find a program yet that will downsample that far and I'm too dumb to understand the technical jargon provided by everyone above... I mean I kind of understand what you all are saying but probably not enough to proceed autonomously with.

I had also considered the talkie library.  That's PWM so it's super small but it's also just on the cusp of being unintelligible and the talkie library doesn't work on the tiny so I'd have to dig around in the library to find out why.  I think wav audio is the way to go, as I already found someone who implemented this on the tiny and I wouldn't have to worry about synthesizing new words.

Adding components unfortunately is not feasible due to packaging space and available I/O so I can't add an EEPROM (although I could use the existing EEPROM to extend the memory a bit further).  I even need to saw off the USB part of the PCB lol.


I had also considered the talkie library.  That's PWM
No it is not. It is LPC ( liner predictive coding ). The problem with it is that I don't know of anything that will encode it only play it back.


Mar 18, 2018, 11:06 pm Last Edit: Mar 18, 2018, 11:13 pm by Gahhhrrrlic
I found a couple of references to software that will do this but after doing some reading, seems the talkie library uses a 16 bit timer that the Tiny does not have so I think it would be a big tear-up to fix the library for Tiny usage.

I've been trying to get the example seen here to work:

Unfortunately, when I hook it up to test, it only produces loud noise over my headphones.  I don't know if this is because of something that's incorrect with my code or whether it has to do with the audio sample itself being messed up.

Code: [Select]

int len = 1882;
int p = 0;

    PROGMEM const unsigned char quack_wav[] = {
0x52, 0x49, 0x46, 0x46, 0x52, 0x07, 0x00, 0x00, 0x57, 0x41, 0x56, 0x45, 0x66, 0x6D, 0x74, 0x20,
0x32, 0x00, 0x00, 0x00, 0x02, 0x00, 0x01, 0x00, 0x40, 0x1F, 0x00, 0x00, 0x00, 0x10, 0x00, 0x00,
0x00, 0x01, 0x04, 0x00, 0x20, 0x00, 0xF4, 0x01, 0x07, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x02,
0x00, 0xFF, 0x00, 0x00, 0x00, 0x00, 0xC0, 0x00, 0x40, 0x00, 0xF0, 0x00, 0x00, 0x00, 0xCC, 0x01,
0x30, 0xFF, 0x88, 0x01, 0x18, 0xFF, 0x64, 0x61, 0x74, 0x61, 0x00, 0x07, 0x00, 0x00, 0x02, 0x10,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xEB, 0xE0, 0xE0, 0x36, 0x44, 0x45, 0x44, 0x43,
0x11, 0x32, 0xF0, 0x0D, 0xFD, 0x8F, 0x0F, 0xEF, 0xF0, 0x10, 0xE1, 0x01, 0xEA, 0xDF, 0xD9, 0xDB,
0xED, 0xCB, 0xEF, 0xF0, 0x01, 0x41, 0x44, 0x54, 0x3F, 0xB0, 0x0C, 0xEE, 0x00, 0x0C, 0x13, 0x0E,
0xE0, 0x10, 0x80, 0x11, 0xE0, 0xF0, 0xE9, 0xE0, 0xFE, 0x00, 0x20, 0x02, 0x41, 0x16, 0x44, 0x44,
0x3E, 0x80, 0x0E, 0x00, 0x11, 0x0F, 0x42, 0xEF, 0x00, 0x00, 0xD7, 0x10, 0x01, 0x0E, 0xDC, 0x0E,
0xD0, 0x41, 0x00, 0x15, 0x0F, 0x26, 0x23, 0x64, 0x3D, 0x80, 0x0D, 0x00, 0x11, 0x0F, 0x32, 0xBF,
0x00, 0xF0, 0xF3, 0x3D, 0xE0, 0x08, 0xEE, 0x0F, 0xEF, 0x20, 0xD0, 0x01, 0xFE, 0x47, 0x22, 0x73,
0x08, 0xF1, 0xEF, 0x01, 0x10, 0xF0, 0x3E, 0xB0, 0x10, 0xF0, 0x15, 0x0E, 0xF1, 0xEC, 0xE0, 0x1F,
0xF1, 0x30, 0xE2, 0x52, 0x02, 0x74, 0x35, 0x40, 0x80, 0x1F, 0xF0, 0x11, 0x00, 0x03, 0xFC, 0x02,
0x00, 0x01, 0x50, 0xEF, 0x1F, 0xCE, 0x01, 0x0E, 0x04, 0x0D, 0x04, 0x21, 0x27, 0x43, 0x53, 0xE8,
0x01, 0xEF, 0x12, 0x10, 0xF0, 0x3D, 0xA0, 0x10, 0xF0, 0x13, 0x0B, 0xE0, 0x0C, 0xE0, 0x10, 0xD0,
0x20, 0xB0, 0x33, 0x01, 0x74, 0x34, 0x18, 0xD1, 0x0D, 0x02, 0x20, 0x0E, 0x11, 0xAE, 0x12, 0xF0,
0x12, 0x2D, 0xB0, 0x1E, 0xCF, 0x12, 0xFE, 0x34, 0xFE, 0x36, 0x21, 0x37, 0x34, 0x2B, 0xB1, 0x0E,
0x02, 0x20, 0x0F, 0x02, 0xDC, 0x04, 0x00, 0x12, 0x2E, 0xDF, 0x1C, 0xA0, 0x01, 0xFE, 0x12, 0x0D,
0x26, 0x22, 0x26, 0x44, 0x08, 0xE1, 0x0E, 0x02, 0x20, 0x0F, 0x10, 0xCD, 0x03, 0xFF, 0x02, 0x1C,
0xEF, 0x0C, 0xBF, 0x00, 0xFE, 0x02, 0x0D, 0x36, 0x22, 0x46, 0x42, 0xC9, 0x01, 0xEF, 0x02, 0x3F,
0x00, 0x2E, 0x07, 0x87, 0x05, 0x31, 0x80, 0x5D, 0x9F, 0x20, 0xF0, 0x22, 0x0E, 0xD0, 0x0B, 0xDF,
0x10, 0xF0, 0x34, 0x01, 0x73, 0x33, 0x65, 0x3F, 0x80, 0x1F, 0xF1, 0x30, 0x00, 0x01, 0x0C, 0xE1,
0x0E, 0x03, 0x20, 0xEE, 0x00, 0xBC, 0x00, 0x0F, 0x02, 0x21, 0x07, 0x31, 0x37, 0x43, 0xF8, 0x01,
0xED, 0x03, 0x00, 0x00, 0x1F, 0xBE, 0x00, 0xC0, 0x21, 0xFF, 0x0F, 0xFB, 0xD0, 0xFE, 0xE0, 0x11,
0x21, 0x72, 0x13, 0x73, 0x1C, 0xC0, 0x0B, 0xF2, 0x20, 0x01, 0x10, 0xED, 0xE0, 0xCE, 0x23, 0x20,
0x00, 0x0E, 0xAF, 0xFF, 0xF1, 0x33, 0x73, 0x44, 0x32, 0x46, 0x22, 0x0D, 0xF0, 0xCC, 0x03, 0x00,
0x34, 0x0F, 0xEF, 0x0E, 0xB0, 0x21, 0xE0, 0x20, 0xEC, 0xFE, 0xBE, 0x02, 0x23, 0x44, 0x42, 0x02,
0x51, 0xF2, 0x50, 0xCF, 0x0C, 0xCE, 0xEE, 0xEE, 0xDF, 0xEB, 0xDF, 0xEB, 0xEE, 0xEE, 0x0E, 0xDF,
0xD8, 0xEF, 0xF0, 0x12, 0x00, 0x10, 0x11, 0xFC, 0x12, 0xCD, 0x04, 0x10, 0xF0, 0x43, 0xF0, 0x72,
0x12, 0x42, 0x22, 0x14, 0x62, 0x23, 0x53, 0x22, 0x44, 0x23, 0x43, 0x33, 0x34, 0x62, 0x02, 0x42,
0x02, 0x33, 0x33, 0x36, 0x32, 0x24, 0x42, 0x13, 0x42, 0x00, 0x11, 0x0F, 0x14, 0x0F, 0xF1, 0x0F,
0xF0, 0x0F, 0xDF, 0xFE, 0xCC, 0xDE, 0xBB, 0xFF, 0xDC, 0xEE, 0xCD, 0xDC, 0xEE, 0xCD, 0xED, 0xBD,
0xCD, 0xDC, 0xCE, 0xFD, 0xCE, 0xFD, 0xCE, 0xEE, 0xDE, 0xF0, 0xFF, 0xF0, 0x0E, 0xEF, 0x00, 0x01,
0x31, 0x12, 0x33, 0x44, 0x24, 0x42, 0x23, 0x62, 0x23, 0x44, 0x33, 0x44, 0x33, 0x43, 0x43, 0x35,
0x32, 0x23, 0x34, 0x33, 0x43, 0x34, 0x44, 0x32, 0x33, 0x33, 0x42, 0x23, 0x21, 0x11, 0x10, 0x01,
0x10, 0x0F, 0x00, 0xFF, 0xFF, 0xED, 0xDD, 0xDC, 0xDB, 0xEE, 0xDC, 0xCD, 0xCD, 0xDC, 0x01, 0x10,

    void setup()

    void loop()
            // Enable 64 MHz PLL and use as source for Timer1
            PLLCSR = 1<<PCKE | 1<<PLLE;     
            // Set up Timer/Counter1 for PWM output
            TIMSK = 0;                              // Timer interrupts OFF
            TCCR1 = 1<<PWM1A | 2<<COM1A0 | 1<<CS10; // PWM A, clear on match, 1:1 prescale
            GTCCR = 1<<PWM1B | 2<<COM1B0;           // PWM B, clear on match
            OCR1A = 128; OCR1B = 128;               // 50% duty at start

            // Set up Timer/Counter0 for 8kHz interrupt to output samples.
            TCCR0A = 3<<WGM00;                      // Fast PWM
            TCCR0B = 1<<WGM02 | 2<<CS00;            // 1/8 prescale
            TIMSK = 1<<OCIE0A;                      // Enable compare match
            OCR0A = 124;                            // Divide by 1000

//            set_sleep_mode(SLEEP_MODE_PWR_DOWN);
            pinMode(4, OUTPUT);
            pinMode(1, OUTPUT);
        while(1) {}


  char sample = pgm_read_byte(&quack_wav[p]);
  OCR1A = sample; OCR1B = sample ^ 255;
  // End of data? Go to sleep
  if (p == len) {
    p = 0;
//    adc_disable();
//    sleep_enable();
//    sleep_cpu();  // 1uA

Another recurring problem I run into is that programs are designed for a different CPU speed than what I have.  I have the digispark 16.5 MHz board but often this is not assumed to be the default speed.  For all I know, the audio sounds like noise because it's playing 100x faster than it's supposed to.

Had to truncate the sound file because of 9000 character limit but basically I got that from an online file to hex converter that stripped the hex content out.


Mar 19, 2018, 07:53 pm Last Edit: Mar 19, 2018, 08:29 pm by Gahhhrrrlic
Ok I'm 100% certain the HEX I'm pulling out of my wav files is garbage.  I tried the HEX that was included in the example website (which I hadn't noticed was included till now) and it sounds just fine with my code, so that code works.  The problem is EITHER a) my wav file isn't in the correct format or b) the online HEX extractor widget I'm using is doing something to the contents.  

I've been saving the wav files as unsigned 8 bit PCM at 8khz.  Do I need to do anything to normalize the volume?  With only 256 quantization levels, maybe the sound is getting clipped...  although it sounds fine on my computer.

Since I'm convinced this is an audio issue, can anyone here help me create a working wav file?  Just for test purposes.  If it works, then simply knowing how you made it should be enough for me to carry on.


It was the volume...  too low and the HEX numbers never get big enough to actually use the 8 bit bandwidth.  This effectively kills the sound quality.  In my sound editor I made the volume so loud it almost clipped and then I was getting D's and E's in the first digits of my hex numbers and the sound came through properly.


Have a look at the attached program. It is nearly what code you posted although I have put my own sample in it, one of me saying "yes". The first two bytes of the sample are the sample length and the code has been rearranged in a bit more of a sensible way with the setup in the setup function and sample control in the loop function. Note variables used in the ISR and main code have be declared as volatile.

This is untested but it will compile. I can't get to an A85 system at the moment. If it works then we can discuss how to generate the sample. For comparison this is how it sounds on an Arduino Uno.


Mar 19, 2018, 09:07 pm Last Edit: Mar 19, 2018, 09:12 pm by MrMark
Unfortunately, when I hook it up to test, it only produces loud noise over my headphones.  I don't know if this is because of something that's incorrect with my code or whether it has to do with the audio sample itself being messed up.
It looks like your sound data is two's complement and, without looking, I expect the DAC wants offset positive binary.  In two's complement [ ..., -1, 0, 1, ...] is [..., 0xFF, 0x00, 0x01, ...] while offset binary is [..., 0x7F, 0x80, 0x81, ...].

If you can manipulate the data (in the Arduino code, even), add 0x80 and save only the lower 8 bits to convert from one to the other.


Inspired by Bit from Tron?

Your code functions but sounds like Donald Duck, presumably because of the clock speed being different.  That's easy to fix by changing the OCR value.  The 'yes' is clearly there and sounds fine.

As mentioned I was able to get my voice to eventually work too by maxing out the volume but I feel like I should be able to record more than half a second.  There must be some other tricks to get this to 2-3 seconds - enough to say a few words.

I tried speeding up my recording, allowing the pitch to change, then using the arduino code to slow it back down again.  Theoretically this allows for a smaller audio file but the problem is this introduced quite a bit of noise.


Mr. Mark.  I think you are right because in my latest attempt, I had a half second of silence at the end of the track and those values are all 0x80, which according to your post, is correct.  Maybe that's partly why I could hear my voice properly... that and the volume being much higher.  It makes sense to me since audio waves are usually centered about 0V but for an unsigned 8 bit number, it would have to be half the range or 128, which equates to neutral diaphragm position on the speaker.


Your code functions but sounds like Donald Duck,
So disregard every other sample. Just increment the variable p by two instead of one. If that works for you then you can simply half the size of the file.


That's another good point.  Perhaps there's a way to also pick which have, as one half may miss certain peaks while the other catches more of them.  Also, even though we don't normally speak quickly, talking at 2x speed should still be understood by most.

Go Up