Thermostat thermal runaway protection logic

I'm trying to make a thermostat and would like to include a thermal runaway protection in my code in case a thermistor fails or detaches. I also intend to use bimetalic switches as a safety net but I would like to have redundancy. I tried to find how others have solved the problem without using another thermistor and came up empty handed (there's Marlin firmware for 3D printers but I don't know where to look for the relevant code). I am thinking I should maybe:

  1. store the last, say, four temperatures recorded (once every five seconds) in an array
  2. store the last four MOSFET control states (i.e. heater on or off) in an array
  3. if the MOSFET has been on for the last four cycles and the recorded temperature is not sufficiently higher than the temperature three cycles ago, release the dogs/set the MOSFET control pins to 0.

Is that what you would do?

The expected currents (2 A) will never get anywhere near the limit of the MOSFETs (30 A) and I will use resettable fuses.

I don't know the likelihood of a MOSFET failing closed but I think could detect it by measuring the voltage across the drain and source (with a potential divider) and if it's low when the MOSFET is meant to be off, then I know it's failed open? Then I can display a message asking the user to manually intervene at their earliest possible convenience. The only thing I can think of is to have a second MOSFET in series with the first that's held permanently closed unless the first one fails closed in which case the second one opens?

Not a complete or entirely relevant answer but maybe give you something to think about:
My central heating controller uses temperature sensors connected by I2C over about 30m of cable. In order to avoid the kind of problems you are concerned about I do the following:

  • I only accept 3 consecutive identical readings as being correct (they are taken every second or so)
  • If the readings are always the same, so some number more than 3 exactly the same then I treat them as wrong and don't use them.
  • I check the temperature readings fall within the expected range and that the data is valid. For example -30 degrees C would suggest something is wrong, as would getting 0xff and nothing else from the sensors.
  • There are 4 temperature sensors, if they all read exactly the same I take that to be an error.
  • I do checks on the I2C interface to be sure it is not locked up or otherwise not working correctly, and restart it if something is wrong.

I hope something in there inspires you.

1 Like

You are making this a lot harder than it needs to be. Placing two MOSFETs in series is on the difficult side. You can simply crowbar the power supply when thermal runaway is detected. If you use a PTC it will probably reset when the power is removed for a short time. The potential divider is OK but what will you do if your load fails open?

Detecting the detached thermistor is going to be difficult if it is measuring within the allowable range, maybe that is why you could not find an example.

Saving the last temperatures is OK. Heaters have a thermal resistance and inertia, you need to take that into consideration when checking for a change.

1 Like

You should draw the system and make a genuine FMEA, Failure Mode Effect Analysis. That study will show the weak points in the design.
As @PerryBebbington tells certain measures can be taken regarding bad sensor value. What about the controller itself? If it fails, hangs... ?
I say that making sure, what ever You build, is that every solder, connection, cable, cabling, is done according to good practise is very important.
Installing redundancy increases the risk for failure a bit.

1 Like

Simple to include current detection.

1 Like

Just for information, the dominant failure mode of a MosFET is all leads become shorted to each others.

You didn't say what thermistor you have, I'll assume a typical 10K NTC.
My first test of the read analog signal is:

  1. Is it within the expected resistance range (for the expected temperature)
  2. Has it changed within the last xxx readings.
  3. I like your test of 4 readings should show some effect of heat on or heat off.
  4. You should enable the WDT in you processor.

Just some thoughts.

1 Like