We are trying to create a system that takes 2 sets of measurements, the one after the other, and i want to autocompare this sets within the arduino. At this time i take the measurement plot them in excel or matlab and complete the calculations .
I want to test how similar is each set of measurements-something like correlation-. without extracting the measurements from arduino.
Comparing sets of measurements is not too difficult but has some points you should take into account.
First you must normalize the measurements (to some degree)
align the X scale - are the samples taken with same interval?
align the Y scale - if needed map Y values on a scale 0 ..1000
align the starting point - this is the tricky part.*
Then you calculate the distance between point pairs and if the distance is under a threshold the graphs are similar.
(simple way is to subtract all pairs and sum the square of the results)
float total = 0;
for (int i=0; i<nrSamples; i++)
{
float diff = a[i] - b[i];
total = diff * diff;
}
Serial.print("arrays a en b are ")
if (total == 0) Serial.println("the same.");
else if ( total < threshold) Serial.println("similar.");
else Serial.println(" different.");
*) the alignment of the starting point is needed to "align the graphs" in the X direction. There are several ways to do this but it really depends on the dataset which method is more suited,
align first/ last value above some value
align minimum / maximum
determine the difference (like above) but offset the array's by 0, 1, 2 ...n and go for the minimum.
there are certain 'signatures'in the signal that helps to align (e.g. a flat value, a sudden range of zero etc)
Method 3 is very robust, although it can be CPU intensive and there is a chance you find more than one minimum.
To compare tow data sets, or two curves if you presents them graphically, first of all you have to solve an equation which best fits a data set. Than simply compare a coefficients. Required precision would define a degree of polynomial, and it's strongly depends on non-linearity of your data, for linear you need just 2 coef : Y = A1 * X + B1., slightly curved- 3 (parabola) Y = A2 * X^2 + A1 * X + B1 would be sufficient , etc.
Look here, in this project arduino solves an cubical polynom Y = A3 * X^3 + A2 * X^2 + A1 * X + B1 in a split of seconds: http://coolarduino.wordpress.com/2013/01/22/true-analog-audio-volume-control-t-a-a-v-c/
And here: http://coolarduino.wordpress.com/2012/10/23/diy-arduino-fm-radio-part-2/
Then you calculate the distance between point pairs and if the distance is under a threshold the graphs are similar.
(simple way is to subtract all pairs and sum the square of the results)
float total = 0;
for (int i=0; i<nrSamples; i++)
{
float diff = a[i] - b[i];
total = diff * diff;
}
Serial.print("arrays a en b are ")
if (total == 0) Serial.println("the same.");
else if ( total < threshold) Serial.println("similar.");
else Serial.println(" different.");
First of all thanks for your answers.
I do not want o compare if the data sets are the same, i want to find in what degree the sets look alike.
Either method proposed will allow you to do that. If the sum of squared distances between matched points is low, the sets look "alike". If the coefficients of the curves fitting the points are similar, the sets look "alike". It is up to you to quantitate how each method measures similarity.
If you can show some examples of data, including some that you think are "alike" and some that are not, you will probably get more useful advice.
Think of a plot like the one i attach. I want to compare the curves . Of course i can understand by eye that the curves are different.
I want to get an answer without looking the plots.
P.S. For some reason i can not upload. i will create a dropbox link.
The forum is partly broken because the upload area is full.
The link works for me, but the graph axis labels and the curves are meaningless to me. One curve looks approximately like a reflection of the other, about the horizontal 1,4 axis, so in that sense they are similar.
How would you define "different" and "similar" curves?
Please post one or two examples of similar curves, and one or two that are not.
Then you calculate the distance between point pairs and if the distance is under a threshold the graphs are similar.
(simple way is to subtract all pairs and sum the square of the results)
float total = 0;
for (int i=0; i<nrSamples; i++)
{
float diff = a[i] - b[i];
total = diff * diff;
}
Serial.print("arrays a en b are ")
if (total == 0) Serial.println("the same.");
else if ( total < threshold) Serial.println("similar.");
else Serial.println(" different.");
First of all thanks for your answers.
I do not want o compare if the data sets are the same, i want to find in what degree the sets look alike.
The number TOTAL in the code (or better the sqrt() of it) is a measurement of similarity.
The added value of this method is that it can be used for the whole signal or partial ranges.
so some parts of the signal are similar, others differ substantially.
With a slight variation of the code, one could count the number of values that are (almost) identical and the values that are not.
In a more elaborated variation one can make e.g. 6 categories and count them separately
#values with diff between 0..2%
#values with diff between 2..5%
#values with diff between 5..10%
#values with diff between 10..25%
#values with diff between 25..100%
robtillaart:
i want to find in what degree the sets look alike
Can you explain what you mean by that ?
Give an example?
?? compare the curves i posted, i used the correlation function from excel.
That what i am looking for.
Of even better, if it is possible, to auto plot the curves of the data in a screen.
The idea is to get two sets of measurements and using the arduino to get an answer how correlated are the sets.
So if it is possible, i am open to any suggestions on how to compare data sets. I will post more curves soon.
Well, it's more than "a few" . I agree, that simple corr. coefficient doesn't tell much, two data set must be equally spaced, and AFAIK, phase shift isn't acceptable.
If you decide to implement the correlation coefficient using the code on the page I linked, be careful with declarations. I noticed that in that code, the variables xsum, ysum, xysum, xsqr_sum, ysqr_sum are declared as integers. Since the default on the Arduino is 16 bit, they might overflow during summations. There will be no warning, just wrong answers. You should probably either declare them as floats or long integers.
The two curves you've shown so far look like coupled growth/decay phenomena, which might be modeled with two amplitudes, two rate constants and an offset. That would make sense, for example, for two competing chemical reactions or a generalized input and output process.
If either of those interpretations makes physical sense, or you can come up with a more appropriate physical interpretation, then you should be able to get four meaningful numbers out of the data by either solving the relevant rate equations or fitting a model solution. Comparing those derived numbers would be more informative than just a single number like a correlation coefficient. However, fitting the curves is something you would most likely do with Matlab or a more specialized program on a PC or a Mac (as you've already hinted).
If you would care to describe the experiment in a bit more detail, you can expect better informed suggestions.
A simple description of the experiment is:
I use a circuit to measure analog values of mV. In read this measure with the arduino, taking sampling for 3 minutes, store them in the SD, then repeat for another set of values.
After that i take the data, plot them into excel or matlab, count the average, correlate the values and see how similar the data looks. The i extract my answer about how similar are the substances i counted.
The mV values, are taken from a biosensor.