# Record data readings over time then define them using database

Hello There,

My problem is kind of an "approach" one, i'm working on a project where i need to gather the changing data of a sensor over a period of time and use it to find the most similar data set that exists inside the database, this operation will repeat every 2 seconds approx.

An example : say a sensor gave the following readings {50,72,188,45} over a certain period of time while performing a certain sequence of movements.

some of the 500 previously defined sequences of movements are :
...
attack = {20,15,200,44}
wave = {60,69,150,40}
run = {150,250,400,50}
...

if we calculate the difference between the recorded sequence and each and everyone of the saved sequences we'll find that the closest sequence is "wave"
...
attack = {20,15,200,44} = 30+ 57 + 12 + 1 = 100
wave = {60,69,150,40} = 10 + 3 + 38 + 5 = 56 <----- Sequence identified as : wave
run = {150,250,400,50} = 100 + 178 + 212 + 5 = 495
..

i might even calculate the sum of every sequence and have it saved along with it in the database to cut out some of the needed processing power when calculating the differences but still..

the problem is there's a lot of saved sequences and calculating the difference for each one of them every new recorded sequence might not be the optimal approach to this, i mean considering that this task needs to be done by the arduino it self not the computer, is it even possible to do that using the only processing power of the arduino?

• if yes, is there a better way to do this?

• is it possible to calculate the maximum number of sequences the arduino can work with without encountering some delays or lags?

• how's using an arduino Uno for this?

any piece of advice regarding the subject would be much appreciated.

There might be better ways, but you should use the RMSE or the absolute values of the respective differences, not just the difference.

Imagine a measurement [ 20, 220, 0, 220 ].
Then you have two reference measurements to wish to compare it: 1 = [ 20, 220, 0, 220 ] and 2 = [ 110, 110, 110, 110 ].

Obviously, your measurement resembles more closely to reference 1 (reference 2 is just a constant).
However, if you calculate the respective differences, as you explained in your post, the results are 20 for reference 1 and 0 for reference 1. That would mean that your program thinks that it looks more like reference 2, which we know is not the case.

If you take the absolute value of the differences, you get a result of 20 for reference 1, and 420 for reference 2. This time you got the correct result.

If you don't care about the magnitude of the measurements, only the "pattern", another approach would be to calculate the dot product between your measurement and the references (if you interpret them as vectors in ℝ⁴). You may want to normalize your vectors first.

Real pattern detection is a pretty complicated problem.

Pieter