Difficult problem - help?

My full source code is here:
http://tc4-shield.googlecode.com/svn/applications/Artisan/aArtisan/tags/REL-aArtisanQ-beta1/aArtisanQ

It is way too much to post here, and also too much to ask anyone to dig through it all. So what I'm looking for mostly are ideas on how to narrow down the things that might be causing a partial lock-up problem.

Basically the application responds to requests on the serial line to alter the power to a small AC blower. Ultimately the application is intended to control the blower on a small coffee roaster.

Phase angle control is implemented by sensing AC zero crosses on D3, which are set up to trigger int1. The int1 handler sets OCR1A to the appropriate delay count based on the user-requested output level.

Timer1 is set up to trigger a TIMER1_COMPA interrupt when TCNT1 reaches the value in OCR1A. The handler for this interrupt turns on the output to the TRIAC, then resets OCR1A to an appropriate pulse width for the TRIAC. At the end of the pulse, the TIMER1_COMPA gets called again to shut off the output to the TRIAC.

The interrupt handlers are in this file:
http://tc4-shield.googlecode.com/svn/applications/Artisan/aArtisan/tags/REL-aArtisanQ-beta1/aArtisanQ/phase_ctrl.cpp

The Problem is that the ATmega seems to lock up infrequently at random-ish times. I say random-ish because it seems to happen more often when a high output level has been requested. When "locked up", the int1 handler continues to be called at each zero cross and the TIMER1_COMPA interrupt continues to turn the TRIAC on and off. I can see this with a scope placed on the TRIAC gate signal.

But none of the other code in my application executes. It is acting like an IRET is sending it off into space? Neither the main loop code that monitors the presence of an AC signal on the zero cross detector nor the code that watches the serial port executes after a lockup. But the phase angle output continues to operate at whatever output level was in effect when it locked up.

I've checked to see if I am running out of RAM, but there are over 700 bytes available during execution, so I don't think that's it.

I should say that everything works perfectly right up until the time it locks up, and it will run for long periods without locking up if the output level is never changed.

Suggestions (or pointing out of obvious mistakes) on how to track down this error would really be appreciated.

Jim

First, do you have suitable decoupling capacitors installed? Some sort of spike on higher output levels might be causing it.

When "locked up", the int1 handler continues to be called at each zero cross and the TIMER1_COMPA interrupt continues to turn the TRIAC on and off. I can see this with a scope placed on the TRIAC gate signal.

This suggests to me, as I think you realize, that the "main" code has somehow gone somewhere it shouldn't. However the interrupt continuing to fire indicates that wherever the main code is, the interrupt is re-entered, does its stuff, and then returns to the "stuck" point.

I can't see anything obviously wrong in phase_ctrl.cpp. Maybe make your table of values const, but I doubt that will fix it.

If I understand your description correctly the problem occurs roughly here:

if( newN ) {
...

Maybe make a test sketch to check that the computed values are correct for all possible inputs? Maybe some wrap-around or sign issue is at work here?

Hmmm. I think so. My board is based on the Duemilanove design, to which I added temperature sensor, ADC, EEPROM, and a couple of open collector outputs. Here's the schematic:
http://tc4-shield.googlecode.com/svn/hardware/TC4C/V1.10/tc4c110-sch.pdf

On the lower left of the second page of the schematic are the two open collector outputs I am using to drive SSR's. I didn't think I would need decoupling capacitors here?

Thanks for your other ideas, too. I will follow up on them tonight, hopefully.

Jim

JimG:
On the lower left of the second page of the schematic are the two open collector outputs I am using to drive SSR's. I didn't think I would need decoupling capacitors here?

What you have sounds OK.

Since the last lockup event, I added belt+suspenders bounds checking on the requested output values (force to be 0 to 100). I also added the watchdog on the AC signal. Now when controlling resistive loads, I can't get it to lock up (which is good).

Since I can't reproduce the problem with resistive loads, I am beginning to think the problem might be glitches due to switching inductive loads using phase angle control. More testing on a small blower motor is next, probably followed closely by researching snubber design a little more.

Would rapid increases in output level (i.e. rapidly decreasing phase delay) cause "special" problems as far as glitches are concerned when controlling a small universal motor? That seems to be when things were most prone to locking up.

Jim