Algorithm for fast waveform generation w/ amplitude control

This isn't really a question, but a code example that folks might find useful. Might be old hat / obvious to some of you, but I had to implement a fast(ish), arbitrary waveform generator with amplitude scaling for a recent project, so I thought I'd share what I came up with :slight_smile:

Generating an "analog" waveform from a microcontroller usually is done by storing one period of the waveform as samples in an array. You set up a timer (or variable delay loop, etc.), and have a loop / interrupt that fetches & outputs the next point from the array each time around (starting over when it reaches the end). That sample can then be written out to e.g. a DAC or PWM register.

The speed (frequency) the waveform is output at can be varied by changing the timer speed or delay loop value. But adding arbitrary amplitude control is less straightforward. You could store a copy of the waveform at every amplitude (wastes memory), or perform division to downscale the waveform (slooow!). Neither is ideal, so here is what I ended up doing instead to provide linear variable amplitude with only one wave table:

It's pretty well known that you can quickly multiply or divide by powers of 2 in binary by just shifting the bits left or right (similar to multiplying or dividing by 10 by moving the decimal point). For example, if I have a binary value and want to scale it to 1/2, 1/4 or 1/8, that's pretty easy:

value >> 1; /* 1/2 /
value >> 2; /
1/4 /
value >> 3; /
1/8 */

You can abstract this idea a little bit for non-power-of-2 values, as long as they can be expressed as the sum of some power-of-2 values you can generate. So, intermediate values like 3/8 are also easy (add the 1/4 result and the 1/8 result).

Think of each power-of-2 division as a tap that can be turned off and on at will to contribute (or not) to the final result.

To control mixing all these divisions without a hairy mess of if(...) statements, I created a small array of values, one for each 'tap' needed and the same size (e.g. byte, int, etc.) as the samples from the table. In this example (16 amplitudes), it takes 4 divisions/taps.

byte ampl[4];

To set an amplitude, a set_amplitude(a) function takes the amplitude value, a, passed in, and for each bit in that value, sets the corresponding ampl[] tap to either all ones or all zeros. E.g., if I passed in amplitude 5 (that's 0000 0101 in binary), the ampl[] array values would be set to {0x00, 0xFF, 0x00, 0xFF} (since there are only 4 taps, the unused upper bits of a are ignored).

In the waveform generator loop, instead of simply fetching the next table value and spitting it out, for each ampl[] tap the fetched value is shifted right by 1 bit (divided by 2), ANDed with the corresponding tap value, and the result added to the final output value. ANDing any bit with a 1 will leave it unchanged, while ANDing any bit with 0 results in 0. So, anything ANDed with an all-ones tap is unchanged, and anything with an all-zeros tap gives 0.

Instead of e.g.

byte sample_out = table[i];
i++;

the loop does:

sample_out = ((table[i] & ampl[3]) >> 1) + ((table[i] & ampl[2]) >> 2) + ((table[i] & ampl[1]) >> 3) + ((table[i] & ampl[0]) >> 4);

This provides 16 linear amplitude steps at the cost of only 1 bitwise AND, 1 bitshift and 1 addition per tap. I don't know how GCC would optimize the above statement; the 'by hand' equivalent might look like:

temp = table[i] >> 1;
sample_out = temp & ampl[3];
temp = temp >> 1;
sample_out += temp & ampl[2];
temp = temp >> 1;
sample_out += temp & ampl[1];
temp = temp >> 1;
sample_out += temp & ampl[0];

giving a waveform generator loop with 16 linear ampltidue steps for only a handful of extra clock cycles. In addition, the number of cycles used is the same regardless of the sample data or amplitude / taps in use, avoiding jitter in the waveform output.

The speed (frequency) the waveform is output at can be varied by changing the timer speed or delay loop value

Or you can use a fixed frequency and a phase accumulator.

True. But this would make for a pretty big wavetable; I needed to keep it small in order to fit all of the waveforms and the rest of the code (generating waveforms was just a small part of the project).

For anyone wondering, the "phase accumulator" approach as I understand it is to use a much larger table (more points), keep the speed of the timer constant and set the output frequency by skipping a certain # of points each time (the # of points skipped determines the frequency).