How to detect if a sensor is failing

I'm making a rather big controller (currently 166k compiles code) with a lot of sensors:

  • 20 NTCs (may increase)
  • A LDR
  • Several PIR sensors
  • micro switches
  • magnetic switches (on or off state)

The controller (Mega2560) will read the sensors with a given interval (depending on what the sensor is used for).

What I want to obtain is to write code that can detect if a sensor may create a faulty result (or gives a value out of range). The code shall be robust enough to ignore a single error reading. Instead it should see if the error reading is happening too often over a longer period (let's say a day or so).

9 of the NTCs are used for monitoring the temperature on water pipes. Some of the pipes are for cold water and some for hot water, so the temperature range they normally would see is not the same for all. And based on the temperature on the water pipes, the controller will turn on heating or not (to prevent that the water in them freeze).
Other NTCs are for monitoring outside and inside temperature and temperatures in power supplies.

Some feedback on this would be nice.

BTW, The controller have about 2.5k free memory and I also have a 32kbyte FRAM module where I can store values for later use (some of the space is already in use but there is still place for more)

Sounds like fun…
Post your code and block diagram.

And, define an error.
It's easy to write out of range detection into code- the hard part is deciding what is an error.

1 Like

Then I would suggest to create arrays of upper and lower bounds for each individual sensor that you can check against.

As @SteveMann pointed out, it'll be tricky to determine if you're looking at an erroneous reading or an accurate reading that happens to be odd because of a real-world anomaly. One way you could approach this is to include redundancy, but you'd have to add even more sensors to your system and I assume this will be undesirable. Another approach would be a form of statistical process control (SPC), but it's fairly involved/complex. I think the checking if readings are within sensible bounds is the most realistic/feasible approach.

Not your question however,

With such a large number of inputs, you will need to carefully protect the MEGA inputs. Voltage spikes and ESD discharges will lock your processor up regularly if not protected. Also you should plan on implementing the WDT so if there is a lockup, the MEGA will recover.

There are numbers of LC filters, opt-couplers and diodes to protect the input. That is not my worry. I already have a test model where I develop the code (currently 1500 hours work including hardware and software and debuggin).

The code? That is not relevant. It is more the approach how to detect if a sensor may have a fault. What to look for and avoid having a short peak above expected range (NTCs and the LDR) trigg an error.

Got an external watchdog, so that is not a problem (even got a Watchdog monitor - that count the number of times it have kick the MCU in it a*s)

Beside, since power loss can be a problem where this hardware is installed, a UPS is made controlled by the controller. Also working fine.

Back on topic; have you read up on what's available on failure modes and likelihoods of the sensors you use? Information that's probably hard to come by, but some detective work generally gives some clues. I suspect that your NTCs are very unlikely to fail catastrophically. They may very well drift, but that's going to be hard to detect unless you have several sensors monitoring the same spot. The only thing you could do against that is schedule periodic calibrations to bring readings back in line, but I'm not sure if your application allows this.

I have not done much search yet but I will. I was hoping that other smart boys and girls here have been looking at this "problem"/issue before.

Some of the things that I can see can go wrong is bad connection/broken wire. Bad readings etc. Drifting (NTCs) is not a issue since the accuracy of the sensor does not need to less than 1 deg C.

Calibrating can be difficult for some of the NTCs since they are mounted directly on the water pipes and then there is an insulation layer then a "mouse trap" layer (aluminum based).

That would be fairly easy to spot as it will give a predictable and quite consistent ADC reading that virtually cannot be confused with a real reading.

Depends on the degree of drift, but I know what you mean.

Yeah, I see the problem. The only solution would be to have a calibration mode where an operator/maintenance personnel actually performs measurements close to the NTC's and verifies those with the readings reported by your device. But like I said, it's kind of labor intensive and perhaps not feasible.

This topic was automatically closed 120 days after the last reply. New replies are no longer allowed.