CPU time bottleneck butching PWM waveform

So I am trying to do a Sine wave in PWM... and I've succeeded in some sense (in large part thanks to the help of forum members). If you run the code below, you'll see that I have pins 3 and 11 producing complimentary PWM waveforms approximating a sine wave. However... the problem I seem to be running into is that there's not enough time to execute the fine adjustments I am making to the modulation of the carrier wave.

1 of my arrays contains the time intervals between duty changes, the idea being that by making duty my linear, independent variable (1,2,3...254,255) I only need to adjust the time at which the duty changes, in order to get the sinusoidal shape. Again, it kinda works but the waveform has obvious glitches in it particularly at the ends of the array, when the time intervals are very short between interrupts.

Furthermore those array times were calculated based on the clock timing I am using. Despite this, the resulting wave is wayyyy slower than it should be according to the math. I have no clue why and merely added a fudge factor to compensate for now. When I reduce the fudge factor to about 4% of the values I started with, the wave refuses to get any faster... seems to cap out at around 50Hz and contains lots of glitches at the ends.

My hypotheses are:

  1. That the sum-total of the code I need to handle modulating the wave is taking more CPU time than the time between interrupts and so some events are simply getting missed or butchered.

  2. That this time is tacking onto my calculated intervals, slowing down the speed from what I calculated

To address these problems, I am wondering if there are any optimizations that can be made to either the loop code or ISR code that would reduce the number of CPU cycles they take while still performing the same amount of work. Preliminary optimizations seem to prove that I can reduce cpu load and speed up the wave a bit. When I talk about optimizations I'm not just referring to the algorithm. In fact I'm mostly talking about the syntax. I understand there's a big difference between stuff like digitalWrite and bit-banging in terms of cycles for example.

The other thing that concerns me is how the Timer is running. According to the data sheet, mode 15 on timer1 uses OCR as TOP and should automatically restart itself at BOTTOM when a compare match is performed. However with the glitches I'm getting I'm wondering if it's still counting past the compare match. Do I need to clear TCNT1? Is there a faster way to clip the COM1A0/1 bits than the way I'm doing it?

unsigned int index = 0;
boolean flag = true;

unsigned int duties[] = {
0,
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33,
34,
35,
36,
37,
38,
39,
40,
41,
42,
43,
44,
45,
46,
47,
48,
49,
50,
51,
52,
53,
54,
55,
56,
57,
58,
59,
60,
61,
62,
63,
64,
65,
66,
67,
68,
69,
70,
71,
72,
73,
74,
75,
76,
77,
78,
79,
80,
81,
82,
83,
84,
85,
86,
87,
88,
89,
90,
91,
92,
93,
94,
95,
96,
97,
98,
99,
100,
101,
102,
103,
104,
105,
106,
107,
108,
109,
110,
111,
112,
113,
114,
115,
116,
117,
118,
119,
120,
121,
122,
123,
124,
125,
126,
127,
128,
129,
130,
131,
132,
133,
134,
135,
136,
137,
138,
139,
140,
141,
142,
143,
144,
145,
146,
147,
148,
149,
150,
151,
152,
153,
154,
155,
156,
157,
158,
159,
160,
161,
162,
163,
164,
165,
166,
167,
168,
169,
170,
171,
172,
173,
174,
175,
176,
177,
178,
179,
180,
181,
182,
183,
184,
185,
186,
187,
188,
189,
190,
191,
192,
193,
194,
195,
196,
197,
198,
199,
200,
201,
202,
203,
204,
205,
206,
207,
208,
209,
210,
211,
212,
213,
214,
215,
216,
217,
218,
219,
220,
221,
222,
223,
224,
225,
226,
227,
228,
229,
230,
231,
232,
233,
234,
235,
236,
237,
238,
239,
240,
241,
242,
243,
244,
245,
246,
247,
248,
249,
250,
251,
252,
253,
254,
255,
254,
253,
252,
251,
250,
249,
248,
247,
246,
245,
244,
243,
242,
241,
240,
239,
238,
237,
236,
235,
234,
233,
232,
231,
230,
229,
228,
227,
226,
225,
224,
223,
222,
221,
220,
219,
218,
217,
216,
215,
214,
213,
212,
211,
210,
209,
208,
207,
206,
205,
204,
203,
202,
201,
200,
199,
198,
197,
196,
195,
194,
193,
192,
191,
190,
189,
188,
187,
186,
185,
184,
183,
182,
181,
180,
179,
178,
177,
176,
175,
174,
173,
172,
171,
170,
169,
168,
167,
166,
165,
164,
163,
162,
161,
160,
159,
158,
157,
156,
155,
154,
153,
152,
151,
150,
149,
148,
147,
146,
145,
144,
143,
142,
141,
140,
139,
138,
137,
136,
135,
134,
133,
132,
131,
130,
129,
128,
127,
126,
125,
124,
123,
122,
121,
120,
119,
118,
117,
116,
115,
114,
113,
112,
111,
110,
109,
108,
107,
106,
105,
104,
103,
102,
101,
100,
99,
98,
97,
96,
95,
94,
93,
92,
91,
90,
89,
88,
87,
86,
85,
84,
83,
82,
81,
80,
79,
78,
77,
76,
75,
74,
73,
72,
71,
70,
69,
68,
67,
66,
65,
64,
63,
62,
61,
60,
59,
58,
57,
56,
55,
54,
53,
52,
51,
50,
49,
48,
47,
46,
45,
44,
43,
42,
41,
40,
39,
38,
37,
36,
35,
34,
33,
32,
31,
30,
29,
28,
27,
26,
25,
24,
23,
22,
21,
20,
19,
18,
17,
16,
15,
14,
13,
12,
11,
10,
9,
8,
7,
6,
5,
4,
3,
2,
1,
0  
};
unsigned int timestamps[] = {
12,
21,
42,
62,
83,
104,
125,
146,
166,
187,
208,
229,
250,
271,
291,
312,
333,
354,
375,
396,
417,
437,
458,
479,
500,
521,
542,
563,
584,
605,
626,
647,
668,
688,
709,
730,
751,
772,
794,
815,
836,
857,
878,
899,
920,
941,
962,
983,
1005,
1026,
1047,
1068,
1089,
1111,
1132,
1153,
1175,
1196,
1217,
1239,
1260,
1282,
1303,
1324,
1346,
1367,
1389,
1410,
1432,
1454,
1475,
1497,
1519,
1540,
1562,
1584,
1606,
1627,
1649,
1671,
1693,
1715,
1737,
1759,
1781,
1803,
1825,
1847,
1869,
1891,
1914,
1936,
1958,
1981,
2003,
2025,
2048,
2070,
2093,
2115,
2138,
2160,
2183,
2206,
2229,
2251,
2274,
2297,
2320,
2343,
2366,
2389,
2412,
2436,
2459,
2482,
2505,
2529,
2552,
2576,
2599,
2623,
2647,
2670,
2694,
2718,
2742,
2766,
2790,
2814,
2838,
2862,
2887,
2911,
2935,
2960,
2984,
3009,
3034,
3058,
3083,
3108,
3133,
3158,
3184,
3209,
3234,
3260,
3285,
3311,
3336,
3362,
3388,
3414,
3440,
3466,
3492,
3519,
3545,
3572,
3598,
3625,
3652,
3679,
3706,
3733,
3761,
3788,
3816,
3843,
3871,
3899,
3927,
3956,
3984,
4013,
4041,
4070,
4099,
4128,
4157,
4187,
4217,
4246,
4276,
4306,
4337,
4367,
4398,
4429,
4460,
4491,
4523,
4554,
4586,
4619,
4651,
4684,
4717,
4750,
4783,
4817,
4851,
4885,
4919,
4954,
4989,
5025,
5061,
5097,
5133,
5170,
5207,
5245,
5283,
5322,
5360,
5400,
5440,
5480,
5521,
5562,
5604,
5647,
5690,
5734,
5779,
5824,
5870,
5917,
5965,
6013,
6063,
6113,
6165,
6218,
6272,
6328,
6385,
6444,
6505,
6567,
6632,
6699,
6769,
6843,
6920,
7001,
7087,
7180,
7281,
7392,
7519,
7668,
7863,
8333,
7863,
7668,
7519,
7392,
7281,
7180,
7087,
7001,
6920,
6843,
6769,
6699,
6632,
6567,
6505,
6444,
6385,
6328,
6272,
6218,
6165,
6113,
6063,
6013,
5965,
5917,
5870,
5824,
5779,
5734,
5690,
5647,
5604,
5562,
5521,
5480,
5440,
5400,
5360,
5322,
5283,
5245,
5207,
5170,
5133,
5097,
5061,
5025,
4989,
4954,
4919,
4885,
4851,
4817,
4783,
4750,
4717,
4684,
4651,
4619,
4586,
4554,
4523,
4491,
4460,
4429,
4398,
4367,
4337,
4306,
4276,
4246,
4217,
4187,
4157,
4128,
4099,
4070,
4041,
4013,
3984,
3956,
3927,
3899,
3871,
3843,
3816,
3788,
3761,
3733,
3706,
3679,
3652,
3625,
3598,
3572,
3545,
3519,
3492,
3466,
3440,
3414,
3388,
3362,
3336,
3311,
3285,
3260,
3234,
3209,
3184,
3158,
3133,
3108,
3083,
3058,
3034,
3009,
2984,
2960,
2935,
2911,
2887,
2862,
2838,
2814,
2790,
2766,
2742,
2718,
2694,
2670,
2647,
2623,
2599,
2576,
2552,
2529,
2505,
2482,
2459,
2436,
2412,
2389,
2366,
2343,
2320,
2297,
2274,
2251,
2229,
2206,
2183,
2160,
2138,
2115,
2093,
2070,
2048,
2025,
2003,
1981,
1958,
1936,
1914,
1891,
1869,
1847,
1825,
1803,
1781,
1759,
1737,
1715,
1693,
1671,
1649,
1627,
1606,
1584,
1562,
1540,
1519,
1497,
1475,
1454,
1432,
1410,
1389,
1367,
1346,
1324,
1303,
1282,
1260,
1239,
1217,
1196,
1175,
1153,
1132,
1111,
1089,
1068,
1047,
1026,
1005,
983,
962,
941,
920,
899,
878,
857,
836,
815,
794,
772,
751,
730,
709,
688,
668,
647,
626,
605,
584,
563,
542,
521,
500,
479,
458,
437,
417,
396,
375,
354,
333,
312,
291,
271,
250,
229,
208,
187,
166,
146,
125,
104,
83,
62,
42,
21,
12                          // I manually changed this and it's mirror to 12 (arbitrary) because 0 leaves no time to be caught.
};


void setup() {
  
  DDRB |= (1 << PB3);       // Pin 11
  DDRD |= (1 << PD3);       // Pin 3

  DDRB |= (1 << PB1);       // Pin 9 output
  DDRB |= (1 << PB2);       // Pin 10 output

  cli();

  TCCR2A = 0;
  TCCR2B = 0;
  TCNT2 = 0;
  
  OCR2A = 255;                    // Channel A duty
  OCR2B = 255;                    // Channel B duty

  TCCR2A |= (1 << WGM20);       // Mode 7
//  TCCR2A |= (1 << WGM21);   
//  TCCR2B |= (1 << WGM22);

//  TCCR2A |= (1 << COM2A0);    // Non-Inverting Mode on Pin A
  TCCR2A |= (1 << COM2A1);
//  TCCR2A |= (1 << COM2B1);


  
  TCCR2B |= (1 << CS20);        // Prescaler 1



  TCCR1A = 0;                   // Clear register
  TCCR1B = 0;                   // Clear register
  TCNT1 = 0;

  OCR1A = 65000;                // Define OCRA for timekeeping.  This will be modulated by Timer2.

  TCCR1A |= (1 << WGM10);
  TCCR1A |= (1 << WGM11);       // Mode 15, Fast PWM with OCR TOP
  TCCR1B |= (1 << WGM12);
  TCCR1B |= (1 << WGM13);

  TCCR1A |= (1 << COM1A1);      // Non-inverting on A pin

  TCCR1B |= (1 << CS10);        // Prescaler 1
  

  TIMSK1 |= (1 << OCIE1A);      // Enble Timer 1 Channel A compare match interrupt
//  TIMSK1 |= (1 << OCIE1B);

  sei();
  
}

void loop() {

  if(index > 510) {               // Check if the end of the array has been reached
    index = 0;                    // If so, immediately go back to the first entry
    TCCR2A ^= (1 << COM2A1);      // A should start generating a waveform while B shuts off and vice versa
    TCCR2A ^= (1 << COM2B1);      // When Sine is +, channel A will produce only positive pulses, when Sine is - channel B will produce only negative pulses
  }
  

ISR (TIMER1_COMPA_vect) {


  OCR1A = timestamps[index]/23;            // Update the time interval till the next duty change
  
  if(index < 256) {
    OCR2A = index;                      // Update the duty of Channel A
    OCR2B = index;                      // Update the duty of Channel B
  }
  else {                                // This entire if/else crap is just because there's not enough RAM to store the entire array
    OCR2A = (index - 2*(index - 255));
    OCR2B = OCR2A;
  }    
  index++;                              // Increment the index
  
}

I haven't quite understand the theory yet, but there are some things you need to consider when changing the timer registers. In some modes, the compare registers are buffered, meaning the change won't take effect until the update/overflow interrupt. I see you are changing OCR1A inside TIMER1_COMPA ISR. The changes you make won't take effect until the next overflow, meaning your COUNT register will count up to the previous OCR1A, then your new OCR1A will take effect on the next up cycle. The other thing also, division and multiplication are cycle heavy so take that into consideration.

Then it sounds like I misunderstood the double buffering feature. My intent was to update the OCR just as the counter is passing by it so that the new value gets hit on the next cycle, not that it misses a complete cycle and hits it 1 and a half cycles later. Should I change modes to something like CTC? For Timer1 I don't need PWM functionality, just the ability to count, interrupt, reset and count again, over and over.

CTC mode does not implement double buffering, but it only applies to the TOP value (either OCR1A or ICR)

Looks like you want to hit multiple compare interrupts so what you can do is set CTC with ICR as TOP, enable OCR1A compare interrupt. On the OCR1A ISR, put a new higher OCR1A so it gets triggered again, and again until your counter value reaches ICR

For example, set CTC with ICR as TOP

ICR = 65000
OCR1A = 1000

CNT starts from 0 up. When CNT == OCR1A, OCR1A ISR is triggered. Inside OCR1A ISR, update OCR1A with a new higher value, for example 2000

CNT continue until CNT == 2000. Again OCR1A ISR is triggered, which then you can update OCR1A again to something higher. You can do this until CNT == ICR, where it resets to zero.

The only requirement for this to work is the next value of OCR1A must be higher than the current value of CNT. Since CNT continues to count inside the ISR, you don't want to update the OCR1A with something like 1010 since by the time that is updated, CNT could possible already higher than that, so you will miss that until the next cycle.

When I add up all your time intervals I get 1,544,571 clock ticks. That's 10.3588 Hz. And that's just a half cycle, right? So full cycles would be going at 5.179 Hz. I thought you were going for 50 Hz. I think you need to divide all of your intervals by more than 10. That is going to be even worse for the short intervals.

Ummm.... I must have made a math error. I did a bunch of formulae in excel to autogenererate that array and when graphed it was a sine shape but maybe my normalization was wrong. I'll check it. Still, the issue is as you said, with even smaller intervals I'm pretty much guaranteed not to hit them. Maybe I should reduce the resolution of my array from 1024 elements per full wave to something like 512 or 256.

Are there any other optimizations that can be made to the syntax itself that will execute the instructions faster? Given the frequency of the interrupts, I feel that every clock cycle that can be saved there will be extremely important. I saw an article benchmarking different syntax to do a bit flip and there was quite a stark difference. Unfortunately the code example was for an actual digital I/O pin and not a register bit so I didn't know how to transfer that example over since I don't know assembly language.

hzrnbgy: That's a neat trick. It didn't occur to me to have multiple interrupts on a single count. I'm not sure they will all fit within a 16 bit number (maybe they will) but it seems doable. The problem however, I think, is still that when the numbers get small at the ends of the array, there are only 21 ticks in which to update OCR. That may be doable if it was just 1 single line of code but I have to keep track of a few things as you can tell from my code so I don't think it can be done in 21 or even 50 clocks. I will inevitably miss a few.

Quite frankly, I haven't looked at your code in detail. So, there could be opportunities for optimization. But, let me take a different tack …. Try throwing better hardware at the problem. Unless you're making thousands of units per month, hardware is cheap … developer's time is expensive.

Consider something in the Teensy Family from T3.2 on up.

While you're at it, check out the synthesis objects that are available in the powerful Teensy Audio Library.

This topic was automatically closed 120 days after the last reply. New replies are no longer allowed.