3-point calibration of pH meter

I'm building a combined temperature- and pH meter and adjuster for a small lab using inexpensive materials. The pH meter is of the eBay quality, and the measurements pulled from an analogue pin.

This is all pretty straightforward, but I really want to do three-point calibrations using buffer solutions with pH 4.01, 6.86 and 9.18 respectively.

My math knowledge doesn’t extend to making linear regression models myself, but I found some code online to get me started, from here: https://www.instructables.com/PH-RegulaterMeter-Arduino/.

It starts by making measurements from the buffer solutions and save those in three different variables, thus:
(The original code is littered with LCD commands, which I erased. It’s also using buffer solutions with pH 4,7 and 10, which I changed to my values).

 for (double i=100; i>0; i--)          //read values for 10 seconds (the pH measuring function contains a 10 msek delay)
    {
      readPH();                         //read current pH value
      pH4.01val = pHvalue;                //set equal to variable for this pH
    }

I don’t know why the original code reads the electrode 100 times when it only saves the last measurement to the variable. I guess it’s giving the electrode time to adjust.

Anyway, the output is three variables: pH4.01val, pH6.86val and pH9.18val.
These are then used to calculate slope and offset:

slope = 5.17 /(pH9.18val - pH4.01val);                  //System of equations to make each reading equal pH 4.01 & 9.18 respectively shown below:
                                                  //       9.18=(pH9.18val*slope)+offset    -    4.01=(pH4.01val*slope)+offset
                                                  //This system of equations creates a straight line trend for all pH readings
  
  offset = (abs(10.87 - ((pH4.01val + pH6.86val)*slope)))/2; //S.O.E. using point at pH 6.86 and pH4.01/9.18 slope to ensure a best fit, below:
                                                    //         4.01=(pH4.01val*slope)+offset    +    6.86=(pH6.86val*slope)+offset
                                                    //slope and offset solved to create best fit line approximation

And I understand what’s happening so far, somewhat. It’s the next part of the code that vexes me:

offset2 = slope*2.97;                             //multiply by old offset value, new slope times old offset
  slope = 0.59*slope;                               //new slope * old slope... "offset2" and new "slope" are used for the following:
                                                    //calibrated pH = (old slope*3.5*pHvalue + old offset)*new slope + new offset
                                                    //              = (old pH reading)*slope + offset
                                                    //see 'readPH()' for application of this equation
  
  if ((pH4.01val + pH6.86val) > 10.87)                       //if total of pH4 and pH7 reading is greater than 11
    {     
      negative = 1;                                 //set negative to hold value of 1 to change algorithm of pHvalue..offset is < 0
    }

THIS IS MY ACTUAL QUESTION:
What are those numbers 2.97 and 0.59?! Where did they come from? The comments say something about “old” offset and “new” slope, but those are just numbers and not saved variables?

All this is then used to correct the measurements in this way:

 if (negative == 0)                        //if the offset is positive... see calibration subroutine.. negative is initialized as 0
    {
      pHvalue = (slope*3.5*pHvalue + offset + offset2); //convert the millivolt into pH value, with positive offset and slope from calibration
    }
  else
    {
      pHvalue = (slope*3.5*pHvalue - offset + offset2); //convert the millivolt into pH value, with negative offset and slope from calibration
    }

Again, I don’t really get the use of the “offset2” variable. I might just be too dumb, but I would love to use this code if only I understood how it works. Does anyone? Can you help me understand those numbers in the calibration part and the use of the offset2 variable?
The full original code is available for download through the Instructables link above.

Interesting question without very much information. I cannot give an intelligent answer or SWAG with the information given. We know what the pH is but no clue as to how you will sense it, Probes but???. You are using an Arduino??? You have connected it as your schematic, not frizzy picture shows??? it is ebay quality, That tells nothing as they have items ranging to the very best to junk. eBay lists several thousand from a few dollars to hundreds which one? What accuracy do you want/expect. How are you going to power it??? That is just a start of what is needed.

But really, the question has nothing to do with the hardware. Did you even bother to actually read what I asked?
It's a math and programming question related to the mathematical correction of sensor input based on three-point calibration, not a question about measurements and hardware. I need to know what specific numbers in the code relates to, and all the code is in the post. I'm not asking for help with the entire project, just this bit.

So no, to help me with what I'm actually asking for, you don't need any hardware information. At all.

It's ok if you don't know the answer, but it's really annoying when people who don't automatically has to write posts where they demand more and unrelated information. It happens all the time. Usually someone pops up a few posts further that actually understands the question and has the answer. I think I'll wait for that guy, thank you very much. You can be THIS guy if you want.

Yes I read it and it is apparent you do not understand the relationship between software and hardware. This is not a math forum but a hardware forum based on the Arduino microprocessor system. You post a blob of incomplete Cpp?? software you got from somewhere or somebody and then you expect us to fill in the blanks. You would be much better off going to your software source they would be much better able to help you. In your case we are given numbers with no reference point or source. My suggestion is to Wait for "THIS guy" I will spend my time helping others that give enough information to answer there question or are willing to find it.

I suppose if I remembered more about calculating slope and offset, I'd be more help.

This is at the end of the Instructable, I imagine this is what you are to use to come up with those values.

In addition, you will need to figure out what offset and slope your pH meter has...

If my slope and offset does not work well with your pH meter, you will need to take the following steps:

(1)-- set slope = 1 and offset = 0

(2)-- take and record pH readings in solutions of exactly pH 4, pH 7, and pH 10

(3)-- Create a system of equations like so:

(actual pH 4 reading)*slope + offset = 4

(actual pH 7 reading)*slope + offset = 7

(actual pH 10 reading)*slope + offset = 10


Use these three equations to find a best fit line to solve for slope and offset and change these constants to your new slope and offset values

I see the Instructable is 4 years old... but have you tried asking in the comments there, just in case the author is still monitoring it?

My question has nothing to do with the relationship between hardware and software. It's apparent you're not actually able to understand what you read.
It's ok if you "help" other people instead. I've been on this forum for ages and so far no-one who has started with snotty and condescending calls for "more information" has ever ended up contributing anything of value in the thread. It's a sure sign of windbaginess. So bye, you won't be missed.

Yes, I messaged the author, but haven't gotten any replies.
And sorry, no, I read that text but it shed no light on the mysterious numbers or the "offset2" variable.
I'd love to write my own calibration code if I can just find someone to hold my hand mathematically.

Y = mX + b
Where m is the slope or rise/run, and b is the offset, the value of Y when X = 0.

If you do not share your full code it is difficult to help you.
For example your snippet contains a variable ph4val. It is not clear where that variable comes from, what the type is and what it represents. I am not willing to answer a question on solving multiple equations and guessing what they are.
Also you could be a bit more friendly to my fellow helpers here. Especially if you want help from other helpers incl. me)
The small bit of code you showed might work, but has at least one flaw. The counter should not be of type double but of type int.
Usually some time to settle should be taken and then an average of 16 (or so) measurements is taken.
I do not like the redefenition of slope=slope*0.5, it obscures how the code works.

Well, like I said, the original code is long and absolutely littered with LCD-commands, so it's very hard to follow. I did an excerpt of the parts that were relevant to the question to make it easier to focus on the actual question. Also, like I wrote in the post, the original code is easily available through the link I provided.

Yes, I noticed the double variable and can't figure out why its being used instead of an int, but it's not really a fatal flaw and doesn't affect function in any relevant way.

But that redefinition you mention is on target! That's part of my problem with the code! I can't understand why it's being done or what it wants to achieve, and I can't understand where the number 0.59 comes from. And like I wrote, I can't understand the use of the "offset2" variable. The rest of the correction is just basic y = kx + m. But then there's this "offset2" variable stuck in there, and it's calculated in very strange and unclear ways.

And the pH4val variable is just a typo, it's really pH4.01val, one of the variables storing the measurements from the buffer solutions. I've edited that now so it won't confuse anyone else, thanks for pointing out!

As for being friendly, I'm always friendly to actual helpers AND newbies AND people asking for help - but I'm honestly fed up with those who enter threads, don't understand the question, doesn't bother to even try, don't know the answer but still wants to act superior by criticising how the post is formatted and ask for more and irrelevant information only to then disappear from the thread and not contribute with anything else. These people are not helpers, they are using this site as means to an end to feel superior by thrashing others. This is an absolute PLAGUE on this site, and for several users this has become a very distinct, toxic modus.
I've started calling them out because it needs to be done.

BTW, this is the end product, or the UI design for it anyway. I'm using a 7" TFT display with a due, taking measurements from two K-type thermocouples, one DS18B20 temperature sensor and then, of course, a pH sensor. Limits are set for the different readings (the red numbers on screen) and when they're reached, or if the sensor input goes below them, customisable alarms are implemented both by sending text messages to my phone or/and by sound/light. Also, automatic adjustment of both pH and temperature is available.

It's a pretty big project that has grown while I was working on it, but I've actually pulled everything off except the part with the pH calibration. Hence my frustration :).


I'd GUESS they are specific to the probe the author is using

@Mixe
I'm ignoring your rant in the hope that you will accept my post here in the spirit it is intended.

From the comments above I feel that while the code MAY have worked for the author its so bad that you have little hope of adapting it to your own equipment.

I had a quick look and found this

which you will see has much clearer code. To save you and others time I've posted the code below.

/*
 # This sample code is used to test the pH meter V1.0.
 # Editor : YouYou
 # Ver    : 1.0
 # Product: analog pH meter
 # SKU    : SEN0161
*/
#define SensorPin A0            //pH meter Analog output to Arduino Analog Input 0
#define Offset 0.00            //deviation compensate
#define LED 13
#define samplingInterval 20
#define printInterval 800
#define ArrayLenth  40    //times of collection
int pHArray[ArrayLenth];   //Store the average value of the sensor feedback
int pHArrayIndex=0;
void setup(void)
{
  pinMode(LED,OUTPUT);
  Serial.begin(9600);
  Serial.println("pH meter experiment!");    //Test the serial monitor
}
void loop(void)
{
  static unsigned long samplingTime = millis();
  static unsigned long printTime = millis();
  static float pHValue,voltage;
  if(millis()-samplingTime > samplingInterval)
  {
      pHArray[pHArrayIndex++]=analogRead(SensorPin);
      if(pHArrayIndex==ArrayLenth)pHArrayIndex=0;
      voltage = avergearray(pHArray, ArrayLenth)*5.0/1024;
      pHValue = 3.5*voltage+Offset;
      samplingTime=millis();
  }
  if(millis() - printTime > printInterval)   //Every 800 milliseconds, print a numerical, convert the state of the LED indicator
  {
    Serial.print("Voltage:");
        Serial.print(voltage,2);
        Serial.print("    pH value: ");
    Serial.println(pHValue,2);
        digitalWrite(LED,digitalRead(LED)^1);
        printTime=millis();
  }
}
double avergearray(int* arr, int number){
  int i;
  int max,min;
  double avg;
  long amount=0;
  if(number<=0){
    Serial.println("Error number for the array to avraging!/n");
    return 0;
  }
  if(number<5){   //less than 5, calculated directly statistics
    for(i=0;i<number;i++){
      amount+=arr[i];
    }
    avg = amount/number;
    return avg;
  }else{
    if(arr[0]<arr[1]){
      min = arr[0];max=arr[1];
    }
    else{
      min=arr[1];max=arr[0];
    }
    for(i=2;i<number;i++){
      if(arr[i]<min){
        amount+=min;        //arr<min
        min=arr[i];
      }else {
        if(arr[i]>max){
          amount+=max;    //arr>max
          max=arr[i];
        }else{
          amount+=arr[i]; //min<=arr<=max
        }
      }//if
    }//for
    avg = (double)amount/(number-2);
  }//if
  return avg;
}

I'd suggest you start by running a simple program to take and average a set of readings for each buffer, then we can work out a suitable curve to interpolate to get the pH of another "unknown"

provided your "unknown" lies SOMEWHERE near that range the results should be OK.

Thank you! My "rant" has no bearing whatsoever on your reply. Honestly, it's pretty obvious when someone actually wants to help.

Anyway, yes I've seen the DFrobot code and some other more commercial and branded solutions, but none of them have the 3-point calibration with calculation of slope and offset that I'm gunning for. If you look at the code you posted you'll se that the variable "offset" is not even assigned a value in it, other than the one it was initialised with. I guess there's some kind of calibration code that you didn't post? But since it lacks a slope value, it's probably just a one-point cal.

But I'm inclined to agree with you about the quality of the code I posted. My hope was that there was something ingenious in this code that I didn't understand and that someone else would see and explain to me, but I fear it's the other way around. There are implications in the code that it might be written by a coder not best described by words such as ... good.

So since no-one here seems to make sense of it either, I'll just drop it. I'll try to get help with the math to construct a linear regression and produce a y = kx+m correction formula from my buffer measurements.

So I consulted a math-fluent friend and tried really really hard to understand his scribblings. This code is what I came up with:

float slope; //slope variable for the pH adjustment
float offset; // offset variable for the pH adjustment
int pHpin = 8;


void setup() {
  pinMode(pHpin,INPUT);
}

void loop() {
  //every 2 seconds, poll the sensor
  readpH();

  //if the calibration button is pressed, start calibration function
  pHcalibration();

}

float readpH()
{
  float pHavg[10];
  
  for(int i=0;i<10;i++)   //take 10 readings from the sensor during 100 ms. Store them in an array. 
     {
      pHavg[i] = analogRead(pHpin);
      delay(10);
     }
  for(int i=0;i<9;i++)   //sorting the readings from low to high
     {
      for(int j=i+1;j<10;j++)
         {
          if(pHavg[i] > pHavg[j])
            {
              float temp = pHavg[i];
              pHavg[i] = pHavg[j];
              pHavg[j] = temp;
            }
         }
     }
   
   float avgVal = 0;
   for(int i=2;i<8;i++) //get the total of the 6 middle values. 
      {
        avgVal += pHavg[i];
      }
   float pre_pH = ((avgVal*5.0)/1024/6)*3.5;  //convert the mean value to millivolts and then to a pH value. 
   float corrected_pH = slope*pre_pH + offset;
   return corrected_pH;
}


void pHcalibration()
{
  //press button when calibration setup is complete for pH 4.01
  float cal1 = readpH();
  //press button when calibration setup is complete for pH 6.86
  float cal2 = readpH();
  //press button when calibration setup is complete for pH 9.18
  float cal3 = readpH();

  float totalVal = cal1 + cal2 + cal3;
  float correctTotal = 20.05; // All the buffer values added : 4.01+6.86+9.18
  float totalValSquare = pow(cal1,2)+pow(cal2,2)+pow(cal3,2);
  float sumProd = cal1*4.01 + cal2*6.86 + cal3*9.18;
  float slope = (3*sumProd - totalVal * correctTotal) / (3*totalValSquare - pow(totalVal,2));
  float offset = (correctTotal - slope*totalVal) / 3;
}

As you can see I left out the different TFT commands since they're not important now. I exchanged them with comments.
Posting this in the hope that someone with a good grasp of math AND arduino programming might feedback on it, or as a resource for anyone that might want to do the same.

Great, this looks like I might easily understand what is happening!

Magic numbers are an indication of bad programming practice. Reuse of variables as well. So I think it is a good choice to abandon the other program.

For good calibration you should not only take an average of 16 or so values, you should also check if the largest and smallest are within reasonable range (stable reading).
Also the measured value itself should be comparable to the last know calibration. You can make your software check that for you and/or you should log these values in a journal. Sudden changes are a strong indication of damaged electrode, bad connections or pollution of your buffer...

3 point calibration is not best practice in the area lower than 5 or higher than 9. 2 point calibration is better in those cases...

Well, in the measuring routine I pull 10 values during 100 ms from the sensor, sort them and keep only the 6 middle numbers. You don't think that's sufficient?

Great tip with the last calibration! I'll just make a variable for it. Also, the TFT display has an SD card reader that's intimately used for the UI design, so I can always dump old calibration values on that and pull them for comparison over time.

You should also check the min and max number.
There might be noise on the line or the electrode might not be at equilibrium yet (then you would see a trend in your values)...
The older (dirtier) the electrode, the slower the response, so failing such a test is a good indicator for cleaning or replacing...

There is not enough information provided to answer that question.

I've done Linear Regression using an ESP32. It took over several hours to develop the LR model.

Here is a link to a LR library which can be used to get you started,
Amazon.com

Here is some code I used for the ESP32 to do LR,

//void fDoTrends( void *pvParameters )
//{
//  const int magicNumber = 96;
//  double    values[2];
//  int       lrCount = 0;
//  float     lrData = 0.0f;
//  float     DataPoints[magicNumber] = {0.0f};
//  float     TimeStamps[magicNumber] = {0.0f};
//  float     dpHigh = 702.0f;
//  float     dpLow  = 683.0f;
//  float     dpAtom = 0.0f;
//  float     dpMean = 0.0f; //data point mean
//  float     tsAtom = 0.0f;
//  float     tsUnit = 0.0f;
//  float     tsMean = 0.0f;
//  bool      dpRecalculate = true;
//  bool      FirstTimeMQTT = true;
//  String    apInfo = "";
//  apInfo.reserve( 150 );
//  for (;;)
//  {
//    if ( xQueueReceive(xQ_lrData, &lrData, portMAX_DELAY) == pdTRUE )
//    {
//      apInfo.concat( String((float)xTaskGetTickCount() / 1000.0f) );
//      apInfo.concat( "," );
//      apInfo.concat( String(lrData) );
//      apInfo.concat( ",0.0" );
//      apInfo.concat( ",0.0" );
//      if ( MQTTclient.connected() )
//      {
//        xSemaphoreTake( sema_MQTT_KeepAlive, portMAX_DELAY );
//        MQTTclient.publish( topicPressureInfo, apInfo.c_str() );
//        vTaskDelay( 1 );
//        xSemaphoreGive( sema_MQTT_KeepAlive );
//      }
//      xQueueSend( xQ_pMessage, (void *) &apInfo, portMAX_DELAY ); // wait for queue space to become available
//      apInfo = "";
//      //find dpHigh and dpLow, collects historical high and low data points, used for data normalization
//      if ( lrData > dpHigh )
//      {
//        dpHigh = lrData;
//        dpRecalculate = true;
//      }
//      if ( lrData < dpLow )
//      {
//        dpLow = lrData;
//        dpRecalculate = true;
//      }
//      if ( lrCount != magicNumber )
//      {
//        DataPoints[lrCount] = lrData;
//        TimeStamps[lrCount] = (float)xTaskGetTickCount() / 1000.0f;
//        log_i( "lrCount %d TimeStamp %f lrData %f", lrCount, TimeStamps[lrCount], DataPoints[lrCount] );
//        lrCount++;
//      } else {
//        //shift datapoints collected one place to the left
//        for ( int i = 0; i < magicNumber; i++ )
//        {
//          DataPoints[i] = DataPoints[i + 1];
//          TimeStamps[i] = TimeStamps[i + 1];
//        }
//        //insert new data points and time stamp (ts) at the end of the data arrays
//        DataPoints[magicNumber - 1] = lrData;
//        TimeStamps[magicNumber - 1] = (float)xTaskGetTickCount() / 1000.0f;
//        lr.Reset(); //reset the LinearRegression Parameters
//        // use dpHigh and dpLow to calculate data mean atom for normalization
//        if ( dpRecalculate )
//        {
//          dpAtom = 1.0f / (dpHigh - dpLow); // a new high or low data point has been found adjust mean dpAtom
//          dpRecalculate = false;
//        }
//        //timestamp mean is ts * (1 / ts_Firstcell - ts_Lastcell[magicNumber]). ts=time stamp
//        tsAtom = 1.0f / (TimeStamps[magicNumber - 1] - TimeStamps[0]); // no need to do this part of the calculation every for loop ++
//        for (int i = 0; i < magicNumber; i++)
//        {
//          dpMean = (DataPoints[i] - dpLow) * dpAtom;
//          tsMean = TimeStamps[i] * tsAtom;
//          lr.Data( tsMean, dpMean ); // train lr
//          //send to mqtt the first time
//          if ( FirstTimeMQTT )
//          {
//            apInfo.concat( String(TimeStamps[i]) );
//            apInfo.concat( "," );
//            apInfo.concat( String(DataPoints[i]) );
//            apInfo.concat( "," );
//            apInfo.concat(  String(tsMean) );
//            apInfo.concat( "," );
//            apInfo.concat( String(dpMean) );
//            xQueueSend( xQ_pMessage, (void *) &apInfo, portMAX_DELAY ); // wait for queue space to become available
//            apInfo = "";
//          }
//        }
//        if ( !FirstTimeMQTT )
//        {
//          apInfo.concat( String(TimeStamps[magicNumber - 1]) );
//          apInfo.concat( "," );
//          apInfo.concat( String(DataPoints[magicNumber - 1]) );
//          apInfo.concat( "," );
//          apInfo.concat( String(tsMean) );
//          apInfo.concat( "," );
//          apInfo.concat( String(dpMean) );
//          xQueueSend( xQ_pMessage, (void *) &apInfo, portMAX_DELAY );
//          apInfo = "";
//        }
//        FirstTimeMQTT = false;
//        lr.Parameters( values );
//        //calculate
//        tsUnit = TimeStamps[magicNumber - 1] - TimeStamps[magicNumber - 2]; //get the time stamp quantity
//        tsUnit += TimeStamps[magicNumber - 1]; //get a future time
//        tsUnit *= tsAtom; //setting time units to the same scale
//        log_i( "Calculate next x using y = %f", lr. Calculate( tsUnit ) ); //calculate next datapoint using time stamp
//        log_i( "Correlation: %f Values: Y=%f and *X + %f ", lr.Correlation(), values[0], values[1] ); // correlation is the strength and direction of the relationship
//        //calculate datapoint for current time stamp, use current data point against calculated datapoint to get error magnatude and direction.
//        //log_i( "lr.Error( x_pMessage.TimeStamp, x_pMessage.nDataPoint ) %f", lr.Error(x_pMessage.nTimeStamp, x_pMessage.nDataPoint) ); //
//      }
//      log_i( "fDoTrends high watermark % d",  uxTaskGetStackHighWaterMark( NULL ) );
//    } //if ( xQueueReceive(xQ_lrData, &lrData, portMAX_DELAY) == pdTRUE )
//  } //for(;;)
//  vTaskDelete ( NULL );
//} //void fDoTrends( void *pvParameters )

perhaps it can give a clue or 2. I switched to using a Raspberry Pi for LR, KNN, and Tensor.

Thank you for sharing! Unfortunately I don't understand anything of that code. If you peek above you'll see the code I ended up crafting myself. If you have any feedback on that I'm very grateful.

Yeah, I know there's noise on the line. I know it because I've seen it. It's evident in the stray high pH numbers that appears in some series of measurements. I will eventually construct a filter for them, but for now just sorting the numbers and getting rid of the extremes seem to work.