Filtering rogue data

Is there an accepted scientific way of filtering out rogue data points from a data set? Using rules of thumb just seems unscientific.

The attached picture shows variations in the rotation speed of a 1 kW mains powered induction motor over a period of 30 minutes. The samples are at 1 second intervals and come from a hall effect sensor sensing a magnet on the motor axis. The data is logged via an an Arduino interrupt.

(The 47.5 Hz average is pretty normal for a small induction motor running on a 50 Hz mains supply.)

About 1 percent of the readings are way outside the average. There's no correlation with the variations in the mains voltage or frequency which are both very stable. The motor is driving a circulating water pump so that should be a constant load.

Given the high inertia of the motor I can only assume that these are due to flaws in the measurement method perhaps due to the Arduino being busy handling interrupts from a flow sensor (at 2 Hz) and CAN bus interface (at 10 Hz).

I only noticed this issue by chance because the rogue data points were occasionally triggering an automated shutdown due to the abnormally high or low reported speed. As I said at the beginning I could just make up a rule of thumb that says if a reading is more than some percentage away from the average then ignore it, but is there a better way?

The best solution would determine and eliminate the source of the spikes.

If the other measured values are reliable, I'd check for (physically impossible) jumps in rotation speed, and discard such values.

Given the typical frequency stability of the mains, which is exceptionally good, changes in motor speed are directly proportional to changes in motor load. Standard induction machines vary in speed from a few rpm below synchronous when unloaded, to the stated nameplate rpm at full load (slip speed).

Looking at the spikes on your photo, I'd estimate them to be about +/- 3% which would equate to about +/-45 rpm. While the high peaks could be explained with a sudden loss in load, the lower speed measurement is below the rated speed of the motor.

While I could accept the upper speed deviation, the lower speed I can not and I would agree that there is an error in your measurement system. Since the errors appear both above and below the average, it's not something as simple as missing a pulse or two from the pickup.

What are you using to establish the time between interrupts?

The scan interval can't be the reason for the spikes, because then every deviation would be followed by a deviation in the opposite direction/speed.

I'm with DrDiettrich about removing the source of the spikes, but if that is impossible or impractical, what you can do is implement a discrete lowpass filter in your code. Link.

This should smooth out your graph and reject any impulses in the graph.

There's all sorts of "scientific" methods you can use. If you think these glitches are very short-lived then use some kind of discard mechanism. Take 10 readings and discard the highest and the lowest, then use the 8 remaining.

If they are balanced high/low then simple averaging will work. Take the average of the last 10 readings. There's some neat tricks to calculating moving averages with a recursive method.

If there's some specific frequency component in the signal that you want to remove, such as the 50Hz hum, then you may consider a Finite Impulse Response filter (FIR) or Infinite Impulse Response filter (IIR).

The best book on this is available online here: http://www.dspguide.com/ This covers all the above filter methods. When you've read that, and if you decide on using a FIR, then the best design tool for the FIR filter is here: http://t-filter.engineerjs.com/ This makes it really easy to design your filter and then copy-paste the code into your Arduino.

Looks like I need to check my system by making a dummy sensor that outputs a constant frequency signal, however I don't have any test gear other than Arduinos and an oscilloscope to verify measurements.

Would using another Arduino and the micros() functions be good enough to generate a square wave with a period that stays within a 47.5 +- 0.1 Hz window over a period of several minutes?
(This corresponds to a 21097 to 21008 microseconds max deviation.)

The 16MHz Arduinos only support a resolution of 4us with the micros() function. That's not good enough for your specification. The faster ones like Due or Teensy will get microsecond resolution.

A hardware timer can be used to output almost any frequency with accuracy as good as the original Arduino clock. It may drift over time but it's close enough for your purposes. However setting that up isn't trivial, unles you can find a library that works for you.

MorganS:
The 16MHz Arduinos only support a resolution of 4us with the micros() function. That's not good enough for your specification. The faster ones like Due or Teensy will get microsecond resolution.

A hardware timer can be used to output almost any frequency with accuracy as good as the original Arduino clock. It may drift over time but it's close enough for your purposes. However setting that up isn't trivial, unles you can find a library that works for you.

Found the TimerOne library
https://www.pjrc.com/teensy/td_libs_TimerOne.html
which does the job.

For a statistical treatment to filter rogue data, look up percentile

DrDiettrich:
The best solution would determine and eliminate the source of the spikes.

Well I put my 47.5Hz constant test signal in and saw the exact same problem as before.
I went through my code and found some places where I hadn't put nointerrupt() guards around accessing volatile variables, but that didn't fix the problem.

It turns out that other interrupt activity was causing an occasional interrupt jitter of up to 1 ms. My measurement system was flawed because I only measured the period of a single ~21ms pulse once per second. By ignoring the other ~46 pulses I didn't see the corresponding long pulse to go with every short pulse and vise versa.
I now sum the periods of 50 consecutive pulses and take the average so the jitter mostly cancels out.