DDS with Wave data compression - Performance

Hi, I am trying to make a DDS synthesizer with Arduino using direct digital synthesis with wavetable compression.
Compression works by storing just the first 90° of the sine wave (see here for details).

Anyway, I have some issues with the performance of the timer interrupt code. If I play the uncompressed wave the produced sound is smooth, but the compressed wave sounds like if the sampling rate were much slower.
I guess this is due to the more expensive operations I use to "unpack" the wave. This is the code to obtain the wave sample:

compressionSwitch = phaccu >> 30; //upper 2 bits are the compression switch (on which part of the sinewave are we)
    compIcnt = phaccu >> 22;
    cbi(compIcnt, 9);
    cbi(compIcnt, 10);
    icnt = compIcnt;
	switch(compressionSwitch) {

		case 0: //0° to 90°
			waveSample = pgm_read_byte_near(sine + icnt);
		break;
		case 1: //90° to 180°
		  waveSample = pgm_read_byte_near(sine + LENGTH - icnt);
		break;
		case 2: //180° to 270°
		  waveSample = pgm_read_byte_near(sine + icnt) - OFFSET;
		break;
		case 3: //270° to 360°
		  waveSample = pgm_read_byte_near(sine + LENGTH - icnt) - OFFSET;
	}

I am relative new to C/C++, but I don't think that Sum, bit shifts and bit set are expensive operations. Or am I wrong?
Do you see something wrong with this code?

I attached the whole sketch too.
Thanks

dds.cpp (5.98 KB)

It looks like you have a sine table with 256 samples (or 1024 samples for a full sine wave).

The first thing I noticed was that you don't negate the sample for 180 through 360 degrees. The waveform is positive between 0 and 180 and is negative between 180 and 360. You need to subtract the sample from the offset, rather than subtracting the offset from the sample:

case 2: //180° to 270°
	waveSample = OFFSET - pgm_read_byte_near(sine + icnt);
	break;
case 3: //270° to 360°
	waveSample = OFFSET - pgm_read_byte_near(sine + LENGTH - icnt);

The second thing I noticed was that cbi(compIcnt, 10) will have no effect on compIcnt. compIcnt is phaccu shifted 22 bits to the right, so its bit 10 and up will always be cleared. I think you meant to clear bits 8 and 9 and not 9 and 10. But there's an easier and probably faster way to clear those bits:

compIcnt &= 0xff;
// OR
compIcnt %= LENGTH;

Those clears all bits except for bits 0 through 7 (assuming LENGTH is 256).