DS18B20 Fail-Safe Temperature Reading – Safe Values When Sensor Data Is Invalid

This is article 1 of 8 in a series about robust DS18B20 / 1-Wire system design.

Please see the introductory topic:

Logically Safe Usage / Fail Safe

I'm using this kind of sensors a lot and think that might be interesting for others.
This is not about topology, electromagnetic compatibility etc. but my way to deal logically safely with a special kind of “unusable” temperature values.
If you don't want to start i.e. a pump based on such values, especially the error code “-127”, or running the pump forever - here you go, cause the following typical if statement isn’t safe at all!

if (TempMeasured < TempDesired)
{
    digitalWrite(RELAY_PIN, HIGH);   // relay ON
}
else 
{
    digitalWrite(RELAY_PIN, LOW);    // relay OFF
}

The trick is to normalize all those values by defintion to NAN (not a number), no matter what's the cause.
This helps you to make decisions in your code in a logically safe way, cause if one operand is NAN the result becomes NAN too.
That’s the above code, but logically safe:

if (sanitizeTemp(TempMeasured) < TempDesired)
{
    digitalWrite(RELAY_PIN, HIGH);   // relay ON
}
else // TempMeasured >= TempDesired or NAN
{
    digitalWrite(RELAY_PIN, LOW);    // relay OFF
}

IMPORTANT NOTE:
The suggested sanitizeTemp() function deals with special return values of the DallasTemperature library (which might not be familiar to everyone) and adds a very basic plausibility check while not altering a returned value. It is the lowest safety level which is imho absolutely necessary, but it can't help in any way with wrong values inside a plausible range!!!

Enough said, here is the very small sanitizeTemp() function - use at your own risk:

sanitizeTemp() function

// DS18B20 value normalization
// Explanation and background:
// https://forum.arduino.cc/t/ds18b20-engineering-resilience-practical-methods-for-robust-1-wire-systems/1433302
float sanitizeTemp(float t)  // use this function to normalize errors by definition
{
    // sensor error code (exactly -127.0)
    if (t == -127.0) return NAN;

    // sensor start value (exactly 85.0, almost never a real temperature)
    if (t == 85.0) return NAN;

    // physically no valid values 
    // DS18B20 nominally: -55 to +125 °C (use at least this as default!)
    // depending on your application choose your limits for a plausibility check
    if (t < 10.0 || t > 90.0) return NAN;	// as an example!

    // just a valid value
    return t;
}

Isn’t that curing the symptom and not the cause.

Definitely, and at the same time it can make the logic more robust.

And if capture invalid values is the goal one should sanitize it to the max imho and not forget to log the error conditions that happened.

float sanitizeTemp(float t)  // use this function to normalize errors by definition
{
  // sensor start value (exactly 85.0, almost never a real temperature)
  if (t == 85.0) 
  {
    //  log this occurrence, so the error doesn't get lost
    return NAN;
  }
  // physically no valid values -55 to +125 °C
  if (t < -55.0 || t > 125.0)
  {
    //  log this occurrence ...
    return NAN;
  }

  //  application specific limits
  if (t < 10.0 || t > 90.0) 
  {
    //  log this occurrence ...
    return NAN;
  }
  // just a valid value
  return t;
}

Furthermore one should add detection of a more permanent error, e.g. no valid value for more than e,g, 10 seconds. If the sensor cannot be detected anymore (wire problem) one might want to shut down the system in a graceful (or not) way.

logging could be a count and a timestamp of the last occurrence per error type

That’s just dealing in a safe way with the fact that invalid values in a normal setting will occur.
Just take the given unsafe example

if (TempMeasured < TempDesired)
{
    digitalWrite(RELAY_PIN, HIGH);   // relay ON
}
else 
{
    digitalWrite(RELAY_PIN, LOW);    // relay OFF
}

and let’s assume TempDesired = 30 and TempMeasured = -127 due to one simple read error => pump (or whatever) starts!
Why the error occured, what’s the strategy to get enough valid values in a certain timespan … that’s all beyond this safety issue.

Depends what you mean by safe. If I receive one outlier in maybe, 10,000 I might consider it a fluke and ignore it but if I'm constantly receiving outliers then I would think that something is seriously wrong.

Maybe you should also count the number of outliers in a given time frame and set an alarm.

You are absolutely right. There is much more behind a robust code than what I posted - that’s just the bare minimum, but very often ignored. In fact I’m tracking in critical applications the sensor behaviour over time. Especially the gradient of the values (physically possible?) and time between (assumed) true values.

CLARIFICATION

This might help to understand why by using "NAN" here is taking advantage of an in-built feature very easily while introducing basic safety.
In a real world application there aren't only good results - ignoring that is just negligence.
So, no matter what the actual problem is why you are getting no valid readings or too less good ones from a DS18B20 - this is really a completely different story, especially tracking these issues.
But you have to deal at least with the safety issue independently!
The difference between using or not using the offered sanitizeTemp() function (or whatever code you find appropriate) might be burning something down or doing the calculations i.e. for a heating device in a "fail safe" way.

By in-built I mean this (looked it up => some extracts + own knowledge):
Referring to (Arduino / C++ / floating-point handling), NAN is not implemented in a sketch or in a specific library like DallasTemperature. It is part of the IEEE-754 floating-point standard, which is implemented in the compiler’s floating-point runtime (on Arduino Uno: AVR-GCC + libc).

  • The C/C++ standard library (math.h) provides the macro NAN and the function isnan()
  • NAN is a specific bit pattern defined by IEEE-754
  • NAN is a valid float value
  • NAN is not equal to anything, including itself

Please note: You will often find "NaN" for "Not a Number", cause this is standard regarding the IEEE-754. BUT in your code you have to use "NAN" (uppercase only)!

CLARIFICATION II

The following is - as outlined before - off topic, but it helps to understand the difference between "fail safe" by "sanitizing" and monitoring problems regarding the usage of DS18B20s.
In fact this is based on a real project in an electrically very noisy environment which is productive for over 10 years now and I had to learn many things the hard way ...

Anyway just an example, how I handle read errors in general and displaying important info on a (tiny) 16x2 display, which is shown above.
The first line are temperature values and not of interest here.
The second line (toggled between the following error code and the corresponding duration) provides 16 characters: Fs0059v0006r0011

How to read it?

Let's start with the easy parts
s0059
v0006
r0011
s, v, r are the names, regarding the location of the 3 temperature sensors.
The number behind the name is the amount of unuseable readings since start. Pure numbers, no validation!

Now comes the tricky part, cause
F = risk class "F". By my own definition from class "A" (= 0 = no risk), "B" (= 1) ... up to "K" (= 10 = maximum risk, absolutely unbearable)

So in this case "F" is exactly in the middle between no risk and maximum risk. So far so good and I don't want to bother you with too much boring stuff, therefore I try to keep it small and simple. This "F" or any other risk class says something about the integrity by calculating
"risk = severity of damage * probability of occurrence" in a certain way, based on evaluations like these 2 examples:

Let's assume you ask for a new temperature value like this (all simplyfied to give you an idea):

T_s_previous = T_s;
sensors.requestTemperatures();
T_s = sensors.getTempC(SENSOR_S);

and the result is T_s = -127.

Of course not satisfying. Now you can do this "T_s = sensors.getTempC(SENSOR_S);" again to get a value from the scratchpad. And if this value is not -127 and at the same time T_s != T_s_previous and near by (plausible) ... bingo, seems to be the new desired value => that happens really often and the contribution to the risk classification in really minimal - almost nothing.

But if you don't get any useable value over a long time and even "if (SensorsActive != SensorsInstalled)" after "oneWire.reset()" is true, then this is the worst case with maximum contribution.

As you can imagine, between these best and worst case examples are several other contribution levels. From my point of view there is no way to say how to do it perfectly ... still optimizing the approach. So please don't ask me for any code :)

Well, as I said at the beginning, all above is necessary in many applications to ensure the longterm stability but has nothing to do with "sanitizing" the temperature values before using them in calculations which can decide to turn something (dangerous) on!

What seems to be often overlooked is shutting something off can be much more important than turning something on! Why? Think of a powerfull heating device with on-off control.
In such cases I strongly recommend an external watchdog as additional safety measure!

I don’t know of any other sensors where a value in the middle of the published usable range is actually an error value and not a valid measurement. Why would anyone design a sensor like that?

More importantly, why would a designer of a system that needed a temperature sensor pick that particular sensor if they were expecting to see temperatures that spanned that no-go zone? Draw the line at 84 deg and flag anything above as an error.

Also merging multiple distinct error values into a single NAN value means you are throwing away information that could be used to help diagnose a problem.

I wrap all my sensor reading in a class with a bool isValid() function that does multiple checks to try and ascertain whether I’m seeing a number that is plausible or not. If the number is suspect, then mark the sensor as suspect and exclude it from the decision tree.

USAGE EXAMPLE

/*

This is an example how you could utilize the sanitizeTemp() function to

introduce a basic safety layer with almost no effort.

Please note, that the variable T_room is not altered in any way and there is 

also no need for an additional variable like T_room_sanitized to keep 

T_room untouched!




Fictional task:

It's assumed that a room temperature (T_room) should be held at 21.5 °C by using

i.e. an Arduino Uno, one DS18B20 temperature sensor and a relay (SSR)

to on-off control a (safe to use) fan heater.




IMPORTANT NOTE:

The sanitizeTemp() function deals with special return values of the

DallasTemperature library (which might not be familiar to everyone) and adds a

very basic plausibility check while not altering a returned value. It is the
 
lowest safety level which is imho absolutely necessary, but it can't help in
 
any way with wrong values inside a plausible range!




Besides the very small sanitizeTemp() function the ONLY difference 

between logically safe and unsafe imho

is this:     if (sanitizeTemp(T_room) < ROOM_SETPOINT_C) 

instead of:  if (T_room < ROOM_SETPOINT_C) 

=> Bad values don't switch the heater on and maybe more important

they switch the heater reliably off!

*/





#include <OneWire.h>

#include <DallasTemperature.h>




// -------------------- Pin configuration --------------------

#define ONE_WIRE_BUS_PIN          2

#define RELAY_PIN                 8




// -------------------- Control parameters --------------------

#define ROOM_SETPOINT_C           21.5

#define SENSOR_READ_INTERVAL_MS   2000




// -------------------- OneWire / Dallas setup --------------------

OneWire oneWire(ONE_WIRE_BUS_PIN);

DallasTemperature sensors(&oneWire);




// -------------------- Timing --------------------

unsigned long lastReadMillis = 0;




// -------------------- Forward declaration --------------------

float sanitizeTemp(float t);




// =============================================================

// Setup

// =============================================================

void setup()

{

    pinMode(RELAY_PIN, OUTPUT);

    digitalWrite(RELAY_PIN, LOW);   // relay OFF initially




    sensors.begin();

}




// =============================================================

// Main loop

// =============================================================

void loop()

{

    if (millis() - lastReadMillis >= SENSOR_READ_INTERVAL_MS)

    {

        lastReadMillis = millis();   // rollover-safe variant




        // request/get current temperature

        sensors.requestTemperatures();


        float T_room = sensors.getTempCByIndex(0);  // raw reading




        /*

        optional: put in here all measures to get a valid value, 

        replacement value, analytic tools, statistics, warnings ...

        whatever you find appropriate

        */




        // safe decision based on the "original" T_room is possible

        if (sanitizeTemp(T_room) < ROOM_SETPOINT_C) // "the magic"

        {

            digitalWrite(RELAY_PIN, HIGH);   // relay ON

        }

        else // T_room >= ROOM_SETPOINT_C or NAN <= that's the advantage!

        {

            digitalWrite(RELAY_PIN, LOW);    // relay OFF

        }

    }

}




// =============================================================

// Sanitize function

// =============================================================

float sanitizeTemp(float t)  // use this function to normalize errors by definition

{

    // sensor error code (exactly -127.0)

    if (t == -127.0) return NAN;




    // sensor start value (exactly 85.0, almost never a real temperature)

    if (t == 85.0) return NAN;




    // physically no valid values 

    // DS18B20 nominally: -55 to +125 °C (use at least this as default!)

    // depending on this application limits for a simple plausibility check

    if (t < 10.0 || t > 45.0) return NAN;   // assumed plausible range!




    // just a valid value

    return t;

}

USAGE EXAMPLE - some additional thoughts

………………………………………………………………………………………………………………………………..

A) Indicating that there is an error

In case you want to get notified of an error in the most simplistic way using i.e. an Arduino Uno put this in the setup():

pinMode(LED_BUILTIN, OUTPUT); // enable the onboard LED

and this in the loop() after "float T_room = sensors.getTempCByIndex(0); // raw reading"

digitalWrite(LED_BUILTIN, isnan(sanitizeTemp(T_room))); // turns LED on (error) or off (valid value)

……………………………………………………………………………………………………………………………….

B) Getting rid of the 750 ms delay of the DallasTemperature library for 12 bit resolution

The implicit delay is just fine - in the example there is no need to shorten the loop time but the delay is also not nice in general.
As long as the SENSOR_READ_INTERVAL_MS is bigger than the needed conversion time (maximum is 750 ms) you can do this:

To be put in the setup() after "sensors.begin();"

sensors.setWaitForConversion(false); // enable non-blocking

sensors.requestTemperatures(); // prepare for the very first raw reading in the loop()

and this in the loop() [just turned around]

float T_room = sensors.getTempCByIndex(0);  // raw reading, request happend in the example 2 seconds ago


sensors.requestTemperatures();              // request for the next raw reading!

USAGE EXAMPLE - Thermometer

/*
Simple thermometer, using an Arduino Uno, one DS18B20 temperature sensor
(i.e.outdoor) and a 16x2 I2C-LCD (i.e. indoor with the Uno)
first line:   temperature or "no data" is displayed centered
second line:  countdown in seconds, centered, refresh cycle is 30 s

Note: 
In this script is a second temperature variable used to hold the 
sanitized value. And to show you explicitly that NAN is a valid 
value you will find: float currentTemp = NAN;
*/


#include <OneWire.h>
#include <DallasTemperature.h>
#include <Wire.h>
//NewLiquidCrystal library
#include <LiquidCrystal_I2C.h>

// -------------------- Hardware configuration --------------------
#define ONE_WIRE_BUS_PIN   2
#define LCD_I2C_ADDRESS    0x27   // Change if necessary
#define LCD_COLUMNS        16
#define LCD_ROWS           2

// -------------------- Timing configuration --------------------
#define REFRESH_INTERVAL_MS 30000UL   // 30 seconds

// -------------------- Objects --------------------
OneWire oneWire(ONE_WIRE_BUS_PIN);
DallasTemperature sensors(&oneWire);
LiquidCrystal_I2C lcd(LCD_I2C_ADDRESS, LCD_COLUMNS, LCD_ROWS);

// -------------------- Global state --------------------
unsigned long lastRefreshMillis = 0;
float currentTemp = NAN;

// =============================================================
// function: sanitize temperature value
// =============================================================
float sanitizeTemp(float t)  // use this function to normalize errors by definition
{
    // sensor error code (exactly -127.0)
    if (t == -127.0) return NAN;

    // sensor start value (exactly 85.0, almost never a real temperature)
    if (t == 85.0) return NAN;

    // physically no valid values 
    // DS18B20 nominally: -55 to +125 °C (use at least this as default!)
    // depending on this application limits for a plausibility check
    if (t < -30.0 || t > 45.0) return NAN;	// to be defined!

    // just a valid value
    return t;
}

// =============================================================
// function: print centered text on given LCD row
// =============================================================
void printCentered(uint8_t row, const String &text)
{
    int len = text.length();
    int col = (LCD_COLUMNS - len) / 2;
    if (col < 0) col = 0;

    lcd.setCursor(0, row);
    lcd.print("                ");  // clear full row (16 spaces)

    lcd.setCursor(col, row);
    lcd.print(text);
}

// =============================================================
// Setup
// =============================================================
void setup()
{
    sensors.begin();
    sensors.setWaitForConversion(true);  // use default blocking mode

    lcd.begin(LCD_COLUMNS, LCD_ROWS);
    lcd.backlight();

    lastRefreshMillis = millis();
}

// =============================================================
// Main loop
// =============================================================
void loop()
{
    unsigned long now = millis();

    // ---------- Temperature refresh every 30 seconds ----------
    if (now - lastRefreshMillis >= REFRESH_INTERVAL_MS)
    {
        lastRefreshMillis = now;

        sensors.requestTemperatures();  // blocking until conversion done
        float rawTemp = sensors.getTempCByIndex(0);
        currentTemp = sanitizeTemp(rawTemp);
    }

    // ---------- Line 1: Temperature or "no data" ----------
    if (isnan(currentTemp))
    {
        printCentered(0, "no data");
    }
    else
    {
        char buffer[16];
        dtostrf(currentTemp, 0, 1, buffer);  // one decimal place
        String line = String(buffer) + " C";
        printCentered(0, line);
    }

    // ---------- Line 2: Remaining time until next refresh ----------
    unsigned long elapsed = now - lastRefreshMillis;
    unsigned long remainingMs = (elapsed < REFRESH_INTERVAL_MS)
                                 ? (REFRESH_INTERVAL_MS - elapsed)
                                 : 0;

    unsigned int remainingSec = remainingMs / 1000;

    String countdown = String(remainingSec) + " s";
    printCentered(1, countdown);

    delay(200);  // small display refresh pacing (not timing-critical)
}

USAGE EXAMPLE - Sanitizing temperature values and counting specific errors

Additionally to “sanitizing” the temperature value in applications where you need this basic safety measure you can see and test how a very simple counting of errors works.
The underlying code for sanitizing and counting is so small and simple, while helping you a lot in diagnosing what’s going on.

So, if you want to play around a bit …

/*
   ================================================================
   Multi-DS18B20 Diagnostic Sketch with Separated Error Counters
   ================================================================

   If you want to play around a bit ...


   Hardware:
   - Arduino Uno
   - 3x DS18B20 (three-wire operation) on a shared 1-Wire bus
   - Individual VCC lines accessible for testing

   Test Procedures:

   1) Hardware Error Test (-127.0 °C):
      Disconnect ONLY the VCC line of a single sensor.
      Do NOT disconnect GND or DATA.

      Result:
        - Raw temperature = -127.0
        - Sanitized value = NAN
        - error127CountX increments

   2) Power-On Default Test (85.0 °C):
      Reconnect VCC.
      First cycle may return 85.0 °C.

      Result:
        - Sanitized value = NAN
        - NO error counter increment

   3) Plausibility Test:
      Plausibility range is intentionally narrow:
          20.0 °C to 24.0 °C

      Warm a sensor with your fingers so it exceeds 24 °C.

      Result:
        - Sanitized value = NAN
        - errorPlausiCountX increments

   Serial Output Format:

   temp1 = ...; sanitized Temp1 = ...; error127Count1 = ...; errorPlausiCount1 = ...
   temp2 = ...; sanitized Temp2 = ...; error127Count2 = ...; errorPlausiCount2 = ...
   temp3 = ...; sanitized Temp3 = ...; error127Count3 = ...; errorPlausiCount3 = ...

   Repeated every 5 seconds.
*/

#include <OneWire.h>
#include <DallasTemperature.h>

#define ONE_WIRE_BUS_PIN 2
#define MEASURE_INTERVAL_MS 5000UL

OneWire oneWire(ONE_WIRE_BUS_PIN);
DallasTemperature sensors(&oneWire);

// Fake addresses (replace with real ones)
DeviceAddress sensor1 = { 0x28, 0xAA, 0x11, 0x11, 0x11, 0x11, 0x11, 0x01 };
DeviceAddress sensor2 = { 0x28, 0xBB, 0x22, 0x22, 0x22, 0x22, 0x22, 0x02 };
DeviceAddress sensor3 = { 0x28, 0xCC, 0x33, 0x33, 0x33, 0x33, 0x33, 0x03 };

// Separate error counters
unsigned long error127Count1 = 0;
unsigned long error127Count2 = 0;
unsigned long error127Count3 = 0;

unsigned long errorPlausiCount1 = 0;
unsigned long errorPlausiCount2 = 0;
unsigned long errorPlausiCount3 = 0;

unsigned long lastMeasureMillis = 0;

// =============================================================
// sanitizeTemp()
// enhanced by error counting
// =============================================================
float sanitizeTemp(float t, uint8_t sensorId)
{
    // Sensor error code (exactly -127.0)
    if (t == -127.0)
    {
        if (sensorId == 1) error127Count1++;
        if (sensorId == 2) error127Count2++;
        if (sensorId == 3) error127Count3++;
        return NAN;
    }

    // Sensor start value (exactly 85.0, almost never a real temperature)
    if (t == 85.0)
        return NAN;

    // Application-specific plausibility limits
    if (t < 20.0 || t > 24.0)
    {
        if (sensorId == 1) errorPlausiCount1++;
        if (sensorId == 2) errorPlausiCount2++;
        if (sensorId == 3) errorPlausiCount3++;
        return NAN;
    }

    // just a valid value
    return t;
}

// =============================================================

void setup()
{
    Serial.begin(9600);
    sensors.begin();
}

void loop()
{
    unsigned long now = millis();

    if (now - lastMeasureMillis >= MEASURE_INTERVAL_MS)
    {
        lastMeasureMillis = now;

        sensors.requestTemperatures();  // blocking

        float temp1 = sensors.getTempC(sensor1);
        float temp2 = sensors.getTempC(sensor2);
        float temp3 = sensors.getTempC(sensor3);

        float sanitizedTemp1 = sanitizeTemp(temp1, 1);
        float sanitizedTemp2 = sanitizeTemp(temp2, 2);
        float sanitizedTemp3 = sanitizeTemp(temp3, 3);

        Serial.print("temp1 = ");
        Serial.print(temp1);
        Serial.print("; sanitized Temp1 = ");
        Serial.print(sanitizedTemp1);
        Serial.print("; error127Count1 = ");
        Serial.print(error127Count1);
        Serial.print("; errorPlausiCount1 = ");
        Serial.println(errorPlausiCount1);

        Serial.print("temp2 = ");
        Serial.print(temp2);
        Serial.print("; sanitized Temp2 = ");
        Serial.print(sanitizedTemp2);
        Serial.print("; error127Count2 = ");
        Serial.print(error127Count2);
        Serial.print("; errorPlausiCount2 = ");
        Serial.println(errorPlausiCount2);

        Serial.print("temp3 = ");
        Serial.print(temp3);
        Serial.print("; sanitized Temp3 = ");
        Serial.print(sanitizedTemp3);
        Serial.print("; error127Count3 = ");
        Serial.print(error127Count3);
        Serial.print("; errorPlausiCount3 = ");
        Serial.println(errorPlausiCount3);

        Serial.println();
    }
}

Please consider this!

I found your problem. See this

This post serves as the official Solution of DS18B20 Logically Safe Usage / Fail Safe (former title). For the full content, see my original post at the very beginning of this thread and my additional replies.