Go Down

Topic: runningMedian class on playground (Read 17650 times) previous topic - next topic

Sembazuru


Quote
I've been thinking about the issue with an even sized arrays. My understanding about medians from an even number of data points is it should be the mean of the two middle sorted data points. If you want to take this into consideration, what is the best way to determine an even number of data points from and odd number of data points?
[snip]

fastest is the test for oddness
bool odd = _cnt & 0x01;
==>
bool even = !(_cnt & 0x01);  // even = !odd;


Yeah, I'm feeling silly for not thinking of that.

Quote

[implementing even sized arrays correctly]
- if the elements are integers is the median also an integer? according to math it should be a float.
   ==> calculating the average is a float addition + division. + oddness test.


Well, a couple points here:
* The class defines all the elements, and the returns from the array, as float. Even if the user only puts ints (like returns from readAnalog()) into the array, they are stored as floats and the methods return floats. With nothing but integer-type values stored in the array, the only time a decimal value other than zero is returned would be a half. If the float return is silently cast into any of the integer-type datatypes, the half would be stripped, so the effect would be an automatic round-down. In the case of [2, 3], the rounded return would be the same as the current implementation. But in the case of [2, 5] a return of 2 isn't quite appropriate. The return should be 3.5 or rounded down to 3. How one is bothered by it depends on one's rounding practice. If one goes by the practice that exactly half (0.50...0) should be rounded down but anything over half (0.50...01) should be rounded up, then it isn't an issue.
* The oddness test only needs to be performed on _cnt which is uint8_t ("byte" in Adruino parlance). The addition and division may be simply be performed by returning the result of getAverage(2). Not sure how to have one method call another within a class though. My C++ isn't that strong.

Quote

- with odd array size the code works after the first element is added.
  even arrays need at least 2 elements added.


True, but the getMedian method (as well as both the getAverage functions, but they are out of scope for this discussion) uses _cnt to determine the array size, not _size. Thus if only the first element has been added then the apparent array can't be anything but odd. (We already trap for zero, so that isn't an issue.) Unless things change in the class, this should be a non issue.

Quote

in short it adds code complexity to make it right.
I'll leave this point open for after the 0.1.06 version.


That's fair. No sense holding up something that (by the time I'm writing this) has already been released for something still under discussion. ;)
http://www.catb.org/jargon/html/I/I-didn-t-change-anything-.html

robtillaart

@Sembazuru
the correct median for even number of elements will be in the next feature release.

Please give the 0.1.06 a try and if any other things fail.
I want to fix them first before adding new features.

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

robtillaart

@Sembazuru
Here a first implementation of the correct median for even number of elements so you can test it.
Code: (experimental) [Select]
float RunningMedian::getMedian()
{
    if (_cnt > 0)
    {
        if (_sorted == false) sort();
        if (_cnt & 0x01) return _ar[_p[_cnt/2]];                    // ODD
        else return (_ar[_p[_cnt/2]] + _ar[_p[_cnt/2 - 1]]) / 2.0;  // EVEN
    }
    return NAN;
}

Note: this is a delta for the 0.1.06 version, so you should fetch that one first.
Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

robtillaart

updated the version to 0.1.07 on github - https://github.com/RobTillaart/Arduino/tree/master/libraries - 

+ fixed error in 0.1.06  add() :smiley-red:
+ added correct median for even number of elements

@Sembazuru
Please verify 0.1.07 version 

TODO: update playground (note to myself ;)
Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Sembazuru


updated the version to 0.1.07 on github - https://github.com/RobTillaart/Arduino/tree/master/libraries - 

+ fixed error in 0.1.06  add() :smiley-red:
+ added correct median for even number of elements

@Sembazuru
Please verify 0.1.07 version 

TODO: update playground (note to myself ;)



Just got home, and tomorrow is busy through the day. I'll download and check tomorrow night.
http://www.catb.org/jargon/html/I/I-didn-t-change-anything-.html

robtillaart

With the addition of a correct median for even nr elements, this lib is more or less done.
There are still some ideas to investigate, mainly if and when a complete sort is needed.

1) GetMedian() does not need a complete sort(). Sorting only half+1 is enough to detemine the median. There exists  QuickMedian algorithm (from Hoare, inventor of quicksort)  which recursively selects the partition in which the median lies. It is  O(2Log n) on average. Worst case it does a complete sort O(n2). It probably means an increase in code size.

2) getHighest() and getLowest() can be found with one iteration over the array O(n) similar to getAverage(). However a sorted array gives the lowest and highest in O(1).

3) getAverage(nMedians) needs a complete sort, or at least a partial sort which encloses the nMedians. A partial quicksort that only sort those partitions that include the range nMedians could do some good. Added complexity is probably the price.

4) Keeping the elements sorted after single addition/replace is optimal with insertSort() - see my modifications of the FastRunningMedian(rkail) above. Merging the speed of FastRunningMedian with functionality of RunningMedian is the holy grail ;)

New function ideas:
Quartile(n),
n=0,1,2,3,4 where the median is the 'half' element, quartile(0) = lowest(), quartile(1) is element at quart, quartile(2) == median, quartile(3) is element at 3/4 and quartile(5) is highest.
getElement(n) + getSortedElement(n)
readonly access to the (sorted) internal buffers. would make implementation of quartile, lowest and highest trivial, and allows iteration to do whatever the user wants.
mode()
returns the element that occurs most often. Problem is that this is not always one unique number e.g. all numbers occur once.
predict()
returns the largest distance to the neighbour of the median.  e.g.
{1,2,3,4,5}-> median = 3 predict()= 1
{1,2,3,4,7,9,10,11} median=7, predict()=3
It predicts the maximum change the RunningMedian will make after an addition. e.g. If predict()= 0 means that the RunningMedian will stay the same (stable).  Could be parametrized (d  distance)   d = [1..cnt/2]. There is complexity in the code when the internal buffer is not filled yet, unless this is pre-filled like in FastRunningMedian. Also even/odd adds extra code.

For me the getElements, getSortedElements are the most interesting. Followed by the predict(d). Enough to tinker ;)
Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

robtillaart

updated version to 0.1.08 on github - https://github.com/RobTillaart/Arduino/tree/master/libraries -  

+ added getElement(n)
+ added getSortedElement(n)
+ added predict(n)  // returns the max distance with nth neighbour.
                                   This is the max value the RunningMedian will change in n additions.

next step is the merge with fast version of rkail.

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Sembazuru


updated version to 0.1.08 on github - https://github.com/RobTillaart/Arduino/tree/master/libraries -  

+ added getElement(n)
+ added getSortedElement(n)
+ added predict(n)  // returns the max distance with nth neighbour.
                                   This is the max value the RunningMedian will change in n additions.

next step is the merge with fast version of rkail.




Wow... didn't even wait for my replies. I guess the weekend waits for no one.  ;)

I did notice one minor issue, you forgot to update #define RUNNING_MEDIAN_VERSION "0.1.07" to 0.1.08.... I corrected this on my local version, as you can see if you look at the attached results.

I just finished coming up with a test sketch, but haven't had a chance to go through the results to check the math. But, I did include timings which may be of interest to you so I'll post the results from my UNO r3 (compiled on 1.0.5 ERW). Here is my test sketch (feel free to steal if from me), sadly only has a sparse few comments...
Code: [Select]

// RunningMedianTest1.ino
#include <RunningMedian.h>

const int sourceData[] =
{ // 50 consecutive samples from Sharp distance sensor model GP2Y0A710K0F while stationary.
  300, 299, 296, 343, 307, 304, 303, 305, 300, 340,
  308, 305, 300, 304, 311, 304, 300, 300, 304, 304,
  284, 319, 306, 304, 300, 302, 305, 310, 306, 304,
  308, 300, 299, 304, 300, 305, 307, 303, 326, 311,
  306, 304, 305, 300, 300, 307, 302, 305, 296, 300
};
const int sourceSize = (sizeof(sourceData)/sizeof(sourceData[0]));

RunningMedian samples = RunningMedian(sourceSize);

void setup()
{
  Serial.begin(115200);
  while (!Serial); // Wait for serial port to connect. Needed for Leonardo only.
  delay(1000); // Simply to allow time for the ERW versions of the IDE time to automagically open the Serial Monitor. 1 second chosen arbitrarily.
  Serial.print(F("Running Median Version: "));
  Serial.println(RUNNING_MEDIAN_VERSION);
#ifdef RUNNING_MEDIAN_ALL
  Serial.println(F("All methods available"));
#else
  Serial.println(F("Only constructor, destructor, clear(), add(), and getMedian() available"));
#endif
#ifdef RUNNING_MEDIAN_USE_MALLOC
  Serial.println(F("Dynamic version using malloc() enabled"));
#else
  Serial.print(F("Static version, will always allocate an array of "));
  Serial.print(MEDIAN_MAX_SIZE,DEC);
  Serial.println(F(" floats."));
#endif
}

void loop()
{
  test1();
  while (1);
}

void test1()
{
  unsigned long timerStart = 0;
  unsigned long timerStop = 0;
  float resultFLOAT = 0;
  byte resultBYTE = 0;

  Serial.print(F("Requested median array size = "));
  Serial.println(sourceSize,DEC);
  Serial.print(F("Actual allocated size = "));
  Serial.println(samples.getSize(),DEC);

  Serial.println();

  for(byte i = 0; i <= (sourceSize - 1); i++)
  {
    Serial.print(F("Loop number "));
    Serial.println((i + 1),DEC);

    timerStart = micros();
    samples.add(sourceData[i]);
    timerStop = micros();
    Serial.print(F("Time to add the next element to the array = "));
    Serial.print(timerStop - timerStart);
    Serial.println(F(" microseconds."));

    Serial.println(F("Cumulative source data added:"));
    Serial.print(F("    "));
    for(byte j = 0; j <= i; j++)
    {
      Serial.print(sourceData[j]);
      Serial.print(F(" "));
    }
    Serial.println();

    Serial.println(F("Unsorted accumulated array:"));
    Serial.print(F("    "));
    for(byte j = 0; j < samples.getCount(); j++)
    {
      Serial.print(samples.getElement(j));
      Serial.print(F(" "));
    }
    Serial.println();

    timerStart = micros();
    resultFLOAT = samples.getSortedElement(0);
    timerStop = micros();
    Serial.print(F("Time to sort array and return element number zero = "));
    Serial.print(timerStop - timerStart);
    Serial.println(F(" microseconds."));

    Serial.println(F("Sorted accumulated array:"));
    Serial.print(F("    "));
    for(byte j = 0; j < samples.getCount(); j++)
    {
      Serial.print(samples.getSortedElement(j));
      Serial.print(F(" "));
    }
    Serial.println();

    timerStart = micros();
    resultFLOAT = samples.getMedian();
    timerStop = micros();
    Serial.print(F("getMedian() result = "));
    Serial.println(resultFLOAT);
    Serial.print(F("Time to execute getMedian() = "));
    Serial.print(timerStop - timerStart);
    Serial.println(F(" microseconds."));

    timerStart = micros();
    resultFLOAT = samples.getAverage();
    timerStop = micros();
    Serial.print(F("getAverage() result = "));
    Serial.println(resultFLOAT);
    Serial.print(F("Time to execute getAverage() = "));
    Serial.print(timerStop - timerStart);
    Serial.println(F(" microseconds."));

    Serial.println(F("getAverage(x) results where:"));
    for(byte j = 1; j <= samples.getCount(); j++)
    {
      timerStart = micros();
      resultFLOAT = samples.getAverage(j);
      timerStop = micros();
      Serial.print(F("  x = "));
      Serial.print(j);
      Serial.print(F(" => "));
      Serial.print(resultFLOAT);
      Serial.print(F(" Time to execute = "));
      Serial.print(timerStop - timerStart);
      Serial.println(F(" microseconds."));
    }

    Serial.println(F("predict(x) results where:"));
    for(byte j = 1; j <= (samples.getCount() / 2); j++)
    {
      timerStart = micros();
      resultFLOAT = samples.predict(j);
      timerStop = micros();
      Serial.print(F("  x = "));
      Serial.print(j);
      Serial.print(F(" => "));
      Serial.print(resultFLOAT);
      Serial.print(F(" Time to execute = "));
      Serial.print(timerStop - timerStart);
      Serial.println(F(" microseconds."));
    }

    Serial.println();
    Serial.println();
  }
}


I compiled and ran it twice, once with #define RUNNING_MEDIAN_USE_MALLOC commented out, and once with #define RUNNING_MEDIAN_USE_MALLOC not commented out. I suspect trying to put the results in line with this post will overload the forum, so I'll attach the txt files (100k each).
http://www.catb.org/jargon/html/I/I-didn-t-change-anything-.html

Sembazuru

Just thought I would report in about 0.1.08. I have a sketch that uses RunningMedian that I compiled with v0.1.08 with the dynamic allocation turned on. Granted, all this sketch does with RunningMedian is create an object of 19 elements, then continually (on a 17ms timer) add a new reading from a distance sensor and then get the Average of the middle 11 of the sorted list. I had it running for 6 days without powering off, and never saw any instability. I never deallocated and then reallocated a RunningMedian instance, so it really wasn't a good test of potential memory fragmentation issues...

Now I just need to finish the hardware side of this project to control a dimmer circuit (I'm approaching this by having the output of my UNO control the resistance of a standard light dimmer circuit so I don't have to worry about triggering on zero-crossing. If a dumb potentiometer can do it, why can't I get my smart(?) UNO to do it?). But that's a topic for another thread if I run into issues. ;)

Thank you, again, for this library and adding my requested getAverage(uint8_t) enhancement.
http://www.catb.org/jargon/html/I/I-didn-t-change-anything-.html

robtillaart

Thanks  Sembazuru for all your tests of the 0.1.08 version,
we made a great step forwards with this library the last month.

I'll have a look at your test sketch, indeed it might be useful as additional example.
Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

robtillaart

updated the version to 0.1.10
+ float -> double to support ARM based boards
+ fix in clear()
+ some small refactors

Latest version on github - https://github.com/RobTillaart/Arduino/tree/master/libraries -

as always remarks and comments are welcome.
Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Comiter

Hi Rob,
first of all, thank you for this great library! Getting noiseless readings from ADC is much easier when having access to this great collection of Median/Mode filter.

I tried the Average function of your filter (revision 01.10), and found it to behave in a strange way, not providing the expected averaging as I would anticipating, noticed by the fact that it always provided me with whole numbers, no numbers after the decimal sign. I quickly wrote a code to allow me peek in to _p array, and noticed that this contained all zero's. This is due to the clear() function, which sets values of _p = 0.
If I have understood the code right, _p shall contain the index which gives you the sorted version of _ar.
Thus during the clear() function, the correct way would be to assume:

Code: [Select]

// resets all counters
void RunningMedian::clear()
{
    _cnt = 0;
    _idx = 0;
    _sorted = false;

    for (uint8_t i=0; i< _size; i++) _p[i] = i;
}


instead of

Code: [Select]

// resets all counters
void RunningMedian::clear()
{
    _cnt = 0;
    _idx = 0;
    _sorted = false;

    for (uint8_t i=0; i< _size; i++) _p[i] = 0;
}

  

Doing this gives me the correct average, and a correct printout using the following code:
Code: [Select]

  for (int i=0;i<ArraySIze;i++){
  Serial.print(OffsetVoltage2.getSortedElement(i));
  Serial.print('\t');
  }

robtillaart

#57
Mar 29, 2015, 05:47 pm Last Edit: Mar 29, 2015, 06:42 pm by robtillaart
Bug came in with last version, I'll reverse it asap

(done 0.1.11 on https://github.com/RobTillaart/Arduino/tree/master/libraries/RunningMedian )
Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Gdunge

Hi, I really like this class. Very useful and elegant. Thanks!

I have a few observations and questions.

- the method getElement(n) is documented (in the header file) to "get n'th element from the values in time order". It doesn't actually do this - it returns the n'th element in the circular buffer. It seems to me that what should happen is it should start at _idx (the oldest value) and proceed through the buffer, wrapping around and ending up at the newest element at (_idx - 1).

For example, instead of 

Code: [Select]
double RunningMedian::getElement(uint8_t n)
{
    if ((_cnt > 0) && (n < _cnt))
    {
        return _ar[n];
    }
    return NAN;
}


use this (warning, untested):

Code: [Select]
double RunningMedian::getElement(uint8_t n)
{
    if ((_cnt > 0) && (n < _cnt))
    {
        return _ar[(_idx + n) % _cnt];
    }
    return NAN;
}


Of course this will break compatibility, so perhaps this should be a new method called getTimedElement(n) or getSequencedElement(n).

- Have you tried an insertion sort instead of the bubble sort? If no, I may give it a try myself and let you know what I discover.

- I don't know how to do this myself, but I've seen other classes that can accept different data types. Even ones that will work with many different types, using some kind of template scheme that abstracts out the type definitions. Have you considered doing this?

- Meanwhile, would it work if I just did a global search/replace on RunningMedian.h and RunningMedian.cpp and changed every "double" to "long"?

- While considering how to deal with bad or missing sensor data in a real-time stream, I tried using NANs. You can add them, but they silently break the sort algorithm and therefore also the calculation of a median. I'm not sure how to deal with this, except by not adding NANs in the first place :) INFs work, because they sort correctly.

Thanks again for this inspiring example of code!

robtillaart

#59
Oct 30, 2015, 10:26 am Last Edit: Oct 30, 2015, 10:50 am by robtillaart
Thanks for the compliment

1) the method getElement(n)
I will need to investigate this

2) insertion sort
Have a look at rtail's implementation - http://forum.arduino.cc/index.php?topic=53081.msg1160999#msg1160999
Still it is good to try if insertion does improve things. Note that for the median you need only to get the median right so that the count of numbers smaller and larger are equal. e.g.
1 3 2 6 4   7    10 9 8 14 11
has the median in the middle spot. One could use quicksort and only do the recursive calls for the part in which the median lies. This strategy is called QuickSelect(), See https://en.wikipedia.org/wiki/Quickselect  for details.

3) Different types
No did not consider this, the lib converts all values to double as that is most generic format.
A templated version could optimize memory/performance for 16 or 8 bit values.

4) double to long
No, for most it would work, except for the getAverage() I guess.
A templated version would be solve this

5) Nan
Easiest way is to ignore the NAN values. This can be done in at least 3 ways.
- ignore and do nothing or
- ignore and remove the last item from the measurements. (extra administration)
- ignore and count them separately, maybe keep track of validity of last added value.

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Go Up
 


Please enter a valid email to subscribe

Confirm your email address

We need to confirm your email address.
To complete the subscription, please click the link in the email we just sent you.

Thank you for subscribing!

Arduino
via Egeo 16
Torino, 10131
Italy