Outlier Detection Algorithm

Hi Everyone,

I have a couple of distance sensors hooked up to my board, and I need some sort of algorithm that will throw out the erroneous pieces of data that the sensors may return. In other words if I receive values of 15, 16, 15, 14, 75, and 15, I need an algorithm that will throw out the 75.

Any help? Thanks!

For this type of impulsive noise I would suggest a median filter:

This old post looks fairly relevant (or you can Google "median filter"):

http://www.arduino.cc/cgi-bin/yabb2/YaBB.pl?num=1288290319

--
The Aussie Shield: breakout all 28 pins to quick-connect terminals

Another way is that if the difference between the current reading and the last reading is greater than some threshold then throw away the current reading.

what I would do is:

calculate the average as
average = (sum of all)/number
calculate the deviation of each as
deviation = |value-average|
calculate the average deviation
averageDeviation = (sum of deviation)/number

then discard any that have
= deviation > averageDeviation

YMMV
HTH

Nice to see that one (Median Filter) says don't do averages and another one says use averages :slight_smile: Both are right as it depends how great the deviation is compared to the readings.

If you want to implement the algorithm proposed by mmcp42 you might better use the running average of the last readings as this adapts to changing values where the normal average will will "freeze" in the end. Furthermore by definition to get the averageDeviation one must have devations that are greater that the average, so skipping them will make the average deviation smaller and smaller until it becomes zero.... so there is something not quit 100% (still the idea of the algorithm is ok)

For a running average class see my article on the playground - Arduino Playground - RunningAverage -

Rob

true enough
maybe:

discard any that have
= deviation > 2 * averageDeviation

would be better

my only worry with the running average method is that a major deviation in the first reading will never get discarded
and will blight the remaining calculations

deviation > 2 * averageDeviation

Much better, I should have thought of that :slight_smile:

The running average class will discard the first reading after the internal buffer is full automatically, but you are right that if the first reading is deviating (from what?) there is a problem. This is exactly the power of the median filter algorithm to be able to handle this. So maybe bootstrap the readings with a median filter to get a good/stable starting value and continue with a running average.?

Triggered by this thread I wrote a runningMedian class that could help with removing the outliers. See - http://arduino.cc/playground/Main/RunningMedian - for the details. Might help.