Least Squares Approximation

I wonder if anyone here knows of a good site for me to take a look at re: computational methods for least squares approximation. I'm not attempting anything too fancy, i.e. just a simple linear and perhaps a quadratic approach. Basically, I want to take the output from 5 precision resistor measurements and determine the best compensation curve for errors induced by other components in the circuit.

Another option is to simply do point-to-point linear interpolations based on the precision resistance values. As long as I choose those values carefully, perhaps even a linear point-to-point approach would do well enough to approximate the curve I currently see. But I would probably prefer a more comprehensive method like least squares. Any help is appreciated!

Here's the pseudo-code for least squares.

for each data point
   sum_x += x;
   sum_y += y;
   sum_x2 += x * x;
   sum_y2 += y * y;
   sum_xy = sum_xy + x * y;

// n is the number of points

m = (n * sum_xy - sum_x * sum_y) / (n * sum_x2 - sum_x * sum_x);
b = (sum_x2 * sum_y - sum_x * sum_xy) / (n * sum_x2 - sum_x * sum_x);

Try this one: - http://ceee.rice.edu/Books/LA/leastsq/index.html -

Here’s a C program that fits linear, quadratic, exponential, logarithmic, and power functions; prints the resulting functions and standard deviations; and identifies the best fit (not Arduino, you’ll have to do a bit of hacking):

Oops! Source to long to fit between code tags. Included as an attachment.

fabls.c (8.78 KB)


Gentlemen, you are amazing. Many thanks for the help!

Hi Morris,

in that .c code you have a copyright statement, are you submitting this code under another license? Is it free for us to use?

Think it could make a nice Arduino library !

I put it out here for Arduino forum members to use, learn from, and hack on - but not for re-publication.

OK, fair enough

FWIW, and IANAL, mathematical formulas cannot be patented.

Looking over the code, one would have to re-write the non-math parts to address how the arduino IDE handles variables differently.

Good points, although I don't believe anyone said anything about patents. The ability to derive the math involved can (or should) be within the reach of any first or second year math/engineering university student.

Much more to the point, I think, is whether or not members of a group respect the rights of individuals sufficiently to function as a community.

The code was freely provided for the purposes stated, with full knowledge that all readers would make their own decision as to how they'd deal with it.

FWIW, the program was last compiled using a gcc compiler - and the only Arduino-specific problems should be the printf() calls and the execution speed. (I'm assuming that substituting "unsigned char" for "byte" and "unsigned" for "hword" is trivial.)

The next version of the program was much more interesting - after finding the function which produced the best fit, it replaced the original data with the function's error values and attempted to find another function (additional terms for the original function) to best fit the error values...ad nausea until either there was an "exact" fit or the program was stopped.

Sorry, I should have written copyright re: the mathematical formula. Generally not possible, though there has been some interesting case law surrounding some of the financial math research that trading companies have done for their computerized trading systems.

As for the program, I agree that it's super neat and the quadratic portion of it perfectly addresses my need for a mathematical method for computing the components of a quadratic equation based on a couple of data points.

My implementation will not use the original code - no need to. I prefer my own variable names, etc. anyway. But I will definitely cite the author as a source of inspiration. And should I ever need to use something like the whole program, I'll try to track the original programmer down, get his use permission, etc... at the very least it's common courtesy. Perhaps by now his attitude re: copyright has changed to allow open-source use w/attribution, for example.

You’ll need an Ouija board to talk to the original author - and although I’m happy to provide a copy of Gordon’s FORTRAN source code (I don’t think he’d mind), I really doubt you’ll much enjoy seeing it - but you have been warned. :slight_smile: It was the inspiration, but the C program is original effort and I did take the trouble to work through all of the mathematical derivations.

For only a couple of data points (not on a vertical line) you can fit almost anything. For more than a couple, say 3 or 4, there are more economical methods than least squares. Least squares shines for larger (statistically significant) collections of noisy data, where the plotted points resemble a murmuration.

In my experience, open source works well as a means of charitable contribution and/or when the effort invested and/or the result produced has little to no value - and poorly otherwise. According to Stallman and followers, one could live by contributing value to the open source “heap” and supporting users of those contributions - but in the real world, when the contributions are of high quality little support is needed and the users seldom bother with so much as a “thank you”. You are the recipient of such a charitable contribution - and your thanks was much appreciated.

Inspiration attached. :grin:

fabls.for (9.25 KB)

mathematical formulas cannot be patented.

But their implementation can ! Note that patent law is still very country specific.

robtillaart: But their implementation can ! Note that patent law is still very country specific.

Oh yes... and very complicated. Our usual rule of thumb for a world-wide patent cost (covering most attractive markets for medical devices) was about $300,000. Makes you consider carefully what you want to patent, where, and why.

I am saddened to hear that the original author has passed on before we could tell him how much we admire his work and before we could ask for permission to use it.

Implemented this last night. Have yet to test in anger, as the boards are currently in production.

One thing I never considered until I took a closer look was just how big those numbers can get if you're handling data coming out of a 16 bit ADC. My first foray into 64 bit numbers on the Arduino! There goes the RAM...

Yet another reason to consider digital solutions like the DS18B20 series of temperature sensors. Except, that if everything works, then the accuracy of the above system is to within less than +/-0.2*C (vs. +/-0.5*C for the DS18B20) and the 16bit resolution of the thermistor board far exceeds that of the DS18B20 (which is 9- or 12-bit).

I think that the most effective strategy might be to first use the Arduino as a data acquisition front-end for a more capable machine for the least squares analysis to discover the specifics of the relationships, then make use of that relationship in the production Arduino software.

When that strategy can be adopted, the need to maintain a statistically-significant collection of data can be much reduced.

I did a bit of that myself this past weekend to come up with a formula for the amount of power to be applied to the ignition heater of a small LENR reactor. I wanted a formula that my controller code could adjust as it “learned” about the quirks of each particular reactor so which it might be connected. I’ll attach some plots so you can see what I’m talking about and, perhaps, find the technique useful in your own project. :grin:



Hi and thanks again for your help!

I agree that it is usually possible to find open-loop solutions to speed computations, improve responsiveness, and even performance. Time will tell if I can find a relationship that is temperature-driven and whihc would obviate the need for very accurate least squares approximation with 64 bit integers.

It should be effective for closed-loop control as well. I’m applying heat to bring a Ni/H reaction to the ignition point and, since there isn’t any historical data to work from (and because significant overheating could be hazardous), I’d like the control program to be able to use the actual dT/dt as the basis for determining how aggressively energy is being pumped into the reactor as it is heated to a target (test) temperature.

I’ve shown 200?C as my target temperature for the plots, in which the vertical axis is the duty cycle of the heater and the horizontal axis is the current temperature. The degree of “aggression” is controlled by a one byte value that’s adjusted dynamically to speed up or slow down the heating process.

I finally got around to plotting the “family” of curves for parameter values of 1, 2, 4, 8. and 16 (convenient doing division by shifting). I’ll attach the plot.