Pages: 1 2 3 [4]   Go Down
Author Topic: runningMedian class on playground  (Read 8538 times)
0 Members and 1 Guest are viewing this topic.
Mid-Atlantic, USA
Offline Offline
God Member
*****
Karma: 30
Posts: 515
"Remember kids, the only difference between Science and screwing around is writing it down." - Adam Savage
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
I've been thinking about the issue with an even sized arrays. My understanding about medians from an even number of data points is it should be the mean of the two middle sorted data points. If you want to take this into consideration, what is the best way to determine an even number of data points from and odd number of data points?
[snip]
fastest is the test for oddness
bool odd = _cnt & 0x01;
==>
bool even = !(_cnt & 0x01);  // even = !odd;

Yeah, I'm feeling silly for not thinking of that.

Quote
[implementing even sized arrays correctly]
- if the elements are integers is the median also an integer? according to math it should be a float.
   ==> calculating the average is a float addition + division. + oddness test.

Well, a couple points here:
* The class defines all the elements, and the returns from the array, as float. Even if the user only puts ints (like returns from readAnalog()) into the array, they are stored as floats and the methods return floats. With nothing but integer-type values stored in the array, the only time a decimal value other than zero is returned would be a half. If the float return is silently cast into any of the integer-type datatypes, the half would be stripped, so the effect would be an automatic round-down. In the case of [2, 3], the rounded return would be the same as the current implementation. But in the case of [2, 5] a return of 2 isn't quite appropriate. The return should be 3.5 or rounded down to 3. How one is bothered by it depends on one's rounding practice. If one goes by the practice that exactly half (0.50...0) should be rounded down but anything over half (0.50...01) should be rounded up, then it isn't an issue.
* The oddness test only needs to be performed on _cnt which is uint8_t ("byte" in Adruino parlance). The addition and division may be simply be performed by returning the result of getAverage(2). Not sure how to have one method call another within a class though. My C++ isn't that strong.

Quote
- with odd array size the code works after the first element is added.
  even arrays need at least 2 elements added.

True, but the getMedian method (as well as both the getAverage functions, but they are out of scope for this discussion) uses _cnt to determine the array size, not _size. Thus if only the first element has been added then the apparent array can't be anything but odd. (We already trap for zero, so that isn't an issue.) Unless things change in the class, this should be a non issue.

Quote
in short it adds code complexity to make it right.
I'll leave this point open for after the 0.1.06 version.

That's fair. No sense holding up something that (by the time I'm writing this) has already been released for something still under discussion. smiley-wink
Logged


Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 216
Posts: 13676
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

@Sembazuru
the correct median for even number of elements will be in the next feature release.

Please give the 0.1.06 a try and if any other things fail.
I want to fix them first before adding new features.

Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 216
Posts: 13676
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

@Sembazuru
Here a first implementation of the correct median for even number of elements so you can test it.
Code: (experimental)
float RunningMedian::getMedian()
{
    if (_cnt > 0)
    {
        if (_sorted == false) sort();
        if (_cnt & 0x01) return _ar[_p[_cnt/2]];                    // ODD
        else return (_ar[_p[_cnt/2]] + _ar[_p[_cnt/2 - 1]]) / 2.0;  // EVEN
    }
    return NAN;
}
Note: this is a delta for the 0.1.06 version, so you should fetch that one first.
Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 216
Posts: 13676
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

updated the version to 0.1.07 on github - https://github.com/RobTillaart/Arduino/tree/master/libraries

+ fixed error in 0.1.06  add() smiley-red
+ added correct median for even number of elements

@Sembazuru
Please verify 0.1.07 version 

TODO: update playground (note to myself smiley-wink
Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Mid-Atlantic, USA
Offline Offline
God Member
*****
Karma: 30
Posts: 515
"Remember kids, the only difference between Science and screwing around is writing it down." - Adam Savage
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

updated the version to 0.1.07 on github - https://github.com/RobTillaart/Arduino/tree/master/libraries

+ fixed error in 0.1.06  add() smiley-red
+ added correct median for even number of elements

@Sembazuru
Please verify 0.1.07 version 

TODO: update playground (note to myself smiley-wink


Just got home, and tomorrow is busy through the day. I'll download and check tomorrow night.
Logged


Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 216
Posts: 13676
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

With the addition of a correct median for even nr elements, this lib is more or less done.
There are still some ideas to investigate, mainly if and when a complete sort is needed.

1) GetMedian() does not need a complete sort(). Sorting only half+1 is enough to detemine the median. There exists  QuickMedian algorithm (from Hoare, inventor of quicksort)  which recursively selects the partition in which the median lies. It is  O(2Log n) on average. Worst case it does a complete sort O(n2). It probably means an increase in code size.

2) getHighest() and getLowest() can be found with one iteration over the array O(n) similar to getAverage(). However a sorted array gives the lowest and highest in O(1).

3) getAverage(nMedians) needs a complete sort, or at least a partial sort which encloses the nMedians. A partial quicksort that only sort those partitions that include the range nMedians could do some good. Added complexity is probably the price.

4) Keeping the elements sorted after single addition/replace is optimal with insertSort() - see my modifications of the FastRunningMedian(rkail) above. Merging the speed of FastRunningMedian with functionality of RunningMedian is the holy grail smiley-wink

New function ideas:
Quartile(n),
n=0,1,2,3,4 where the median is the 'half' element, quartile(0) = lowest(), quartile(1) is element at quart, quartile(2) == median, quartile(3) is element at 3/4 and quartile(5) is highest.
getElement(n) + getSortedElement(n)
readonly access to the (sorted) internal buffers. would make implementation of quartile, lowest and highest trivial, and allows iteration to do whatever the user wants.
mode()
returns the element that occurs most often. Problem is that this is not always one unique number e.g. all numbers occur once.
predict()
returns the largest distance to the neighbour of the median.  e.g.
{1,2,3,4,5}-> median = 3 predict()= 1
{1,2,3,4,7,9,10,11} median=7, predict()=3
It predicts the maximum change the RunningMedian will make after an addition. e.g. If predict()= 0 means that the RunningMedian will stay the same (stable).  Could be parametrized (d  distance)   d = [1..cnt/2]. There is complexity in the code when the internal buffer is not filled yet, unless this is pre-filled like in FastRunningMedian. Also even/odd adds extra code.

For me the getElements, getSortedElements are the most interesting. Followed by the predict(d). Enough to tinker smiley-wink
Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 216
Posts: 13676
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

updated version to 0.1.08 on github - https://github.com/RobTillaart/Arduino/tree/master/libraries -  

+ added getElement(n)
+ added getSortedElement(n)
+ added predict(n)  // returns the max distance with nth neighbour.
                                    This is the max value the RunningMedian will change in n additions.

next step is the merge with fast version of rkail.

Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Mid-Atlantic, USA
Offline Offline
God Member
*****
Karma: 30
Posts: 515
"Remember kids, the only difference between Science and screwing around is writing it down." - Adam Savage
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

updated version to 0.1.08 on github - https://github.com/RobTillaart/Arduino/tree/master/libraries -  

+ added getElement(n)
+ added getSortedElement(n)
+ added predict(n)  // returns the max distance with nth neighbour.
                                    This is the max value the RunningMedian will change in n additions.

next step is the merge with fast version of rkail.



Wow... didn't even wait for my replies. I guess the weekend waits for no one.  smiley-wink

I did notice one minor issue, you forgot to update #define RUNNING_MEDIAN_VERSION "0.1.07" to 0.1.08.... I corrected this on my local version, as you can see if you look at the attached results.

I just finished coming up with a test sketch, but haven't had a chance to go through the results to check the math. But, I did include timings which may be of interest to you so I'll post the results from my UNO r3 (compiled on 1.0.5 ERW). Here is my test sketch (feel free to steal if from me), sadly only has a sparse few comments...
Code:
// RunningMedianTest1.ino
#include <RunningMedian.h>

const int sourceData[] =
{ // 50 consecutive samples from Sharp distance sensor model GP2Y0A710K0F while stationary.
  300, 299, 296, 343, 307, 304, 303, 305, 300, 340,
  308, 305, 300, 304, 311, 304, 300, 300, 304, 304,
  284, 319, 306, 304, 300, 302, 305, 310, 306, 304,
  308, 300, 299, 304, 300, 305, 307, 303, 326, 311,
  306, 304, 305, 300, 300, 307, 302, 305, 296, 300
};
const int sourceSize = (sizeof(sourceData)/sizeof(sourceData[0]));

RunningMedian samples = RunningMedian(sourceSize);

void setup()
{
  Serial.begin(115200);
  while (!Serial); // Wait for serial port to connect. Needed for Leonardo only.
  delay(1000); // Simply to allow time for the ERW versions of the IDE time to automagically open the Serial Monitor. 1 second chosen arbitrarily.
  Serial.print(F("Running Median Version: "));
  Serial.println(RUNNING_MEDIAN_VERSION);
#ifdef RUNNING_MEDIAN_ALL
  Serial.println(F("All methods available"));
#else
  Serial.println(F("Only constructor, destructor, clear(), add(), and getMedian() available"));
#endif
#ifdef RUNNING_MEDIAN_USE_MALLOC
  Serial.println(F("Dynamic version using malloc() enabled"));
#else
  Serial.print(F("Static version, will always allocate an array of "));
  Serial.print(MEDIAN_MAX_SIZE,DEC);
  Serial.println(F(" floats."));
#endif
}

void loop()
{
  test1();
  while (1);
}

void test1()
{
  unsigned long timerStart = 0;
  unsigned long timerStop = 0;
  float resultFLOAT = 0;
  byte resultBYTE = 0;

  Serial.print(F("Requested median array size = "));
  Serial.println(sourceSize,DEC);
  Serial.print(F("Actual allocated size = "));
  Serial.println(samples.getSize(),DEC);

  Serial.println();

  for(byte i = 0; i <= (sourceSize - 1); i++)
  {
    Serial.print(F("Loop number "));
    Serial.println((i + 1),DEC);

    timerStart = micros();
    samples.add(sourceData[i]);
    timerStop = micros();
    Serial.print(F("Time to add the next element to the array = "));
    Serial.print(timerStop - timerStart);
    Serial.println(F(" microseconds."));

    Serial.println(F("Cumulative source data added:"));
    Serial.print(F("    "));
    for(byte j = 0; j <= i; j++)
    {
      Serial.print(sourceData[j]);
      Serial.print(F(" "));
    }
    Serial.println();

    Serial.println(F("Unsorted accumulated array:"));
    Serial.print(F("    "));
    for(byte j = 0; j < samples.getCount(); j++)
    {
      Serial.print(samples.getElement(j));
      Serial.print(F(" "));
    }
    Serial.println();

    timerStart = micros();
    resultFLOAT = samples.getSortedElement(0);
    timerStop = micros();
    Serial.print(F("Time to sort array and return element number zero = "));
    Serial.print(timerStop - timerStart);
    Serial.println(F(" microseconds."));

    Serial.println(F("Sorted accumulated array:"));
    Serial.print(F("    "));
    for(byte j = 0; j < samples.getCount(); j++)
    {
      Serial.print(samples.getSortedElement(j));
      Serial.print(F(" "));
    }
    Serial.println();

    timerStart = micros();
    resultFLOAT = samples.getMedian();
    timerStop = micros();
    Serial.print(F("getMedian() result = "));
    Serial.println(resultFLOAT);
    Serial.print(F("Time to execute getMedian() = "));
    Serial.print(timerStop - timerStart);
    Serial.println(F(" microseconds."));

    timerStart = micros();
    resultFLOAT = samples.getAverage();
    timerStop = micros();
    Serial.print(F("getAverage() result = "));
    Serial.println(resultFLOAT);
    Serial.print(F("Time to execute getAverage() = "));
    Serial.print(timerStop - timerStart);
    Serial.println(F(" microseconds."));

    Serial.println(F("getAverage(x) results where:"));
    for(byte j = 1; j <= samples.getCount(); j++)
    {
      timerStart = micros();
      resultFLOAT = samples.getAverage(j);
      timerStop = micros();
      Serial.print(F("  x = "));
      Serial.print(j);
      Serial.print(F(" => "));
      Serial.print(resultFLOAT);
      Serial.print(F(" Time to execute = "));
      Serial.print(timerStop - timerStart);
      Serial.println(F(" microseconds."));
    }

    Serial.println(F("predict(x) results where:"));
    for(byte j = 1; j <= (samples.getCount() / 2); j++)
    {
      timerStart = micros();
      resultFLOAT = samples.predict(j);
      timerStop = micros();
      Serial.print(F("  x = "));
      Serial.print(j);
      Serial.print(F(" => "));
      Serial.print(resultFLOAT);
      Serial.print(F(" Time to execute = "));
      Serial.print(timerStop - timerStart);
      Serial.println(F(" microseconds."));
    }

    Serial.println();
    Serial.println();
  }
}

I compiled and ran it twice, once with #define RUNNING_MEDIAN_USE_MALLOC commented out, and once with #define RUNNING_MEDIAN_USE_MALLOC not commented out. I suspect trying to put the results in line with this post will overload the forum, so I'll attach the txt files (100k each).

* runningMedianStaticTest.txt (99.35 KB - downloaded 17 times.)
* runningMedianDynamicTest.txt (99.34 KB - downloaded 17 times.)
Logged


Mid-Atlantic, USA
Offline Offline
God Member
*****
Karma: 30
Posts: 515
"Remember kids, the only difference between Science and screwing around is writing it down." - Adam Savage
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Just thought I would report in about 0.1.08. I have a sketch that uses RunningMedian that I compiled with v0.1.08 with the dynamic allocation turned on. Granted, all this sketch does with RunningMedian is create an object of 19 elements, then continually (on a 17ms timer) add a new reading from a distance sensor and then get the Average of the middle 11 of the sorted list. I had it running for 6 days without powering off, and never saw any instability. I never deallocated and then reallocated a RunningMedian instance, so it really wasn't a good test of potential memory fragmentation issues...

Now I just need to finish the hardware side of this project to control a dimmer circuit (I'm approaching this by having the output of my UNO control the resistance of a standard light dimmer circuit so I don't have to worry about triggering on zero-crossing. If a dumb potentiometer can do it, why can't I get my smart(?) UNO to do it?). But that's a topic for another thread if I run into issues. smiley-wink

Thank you, again, for this library and adding my requested getAverage(uint8_t) enhancement.
Logged


Global Moderator
Netherlands
Offline Offline
Shannon Member
*****
Karma: 216
Posts: 13676
In theory there is no difference between theory and practice, however in practice there are many...
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Thanks  Sembazuru for all your tests of the 0.1.08 version,
we made a great step forwards with this library the last month.

I'll have a look at your test sketch, indeed it might be useful as additional example.
Logged

Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Pages: 1 2 3 [4]   Go Up
Jump to: