Go Down

Topic: runningMedian class on playground (Read 9982 times) previous topic - next topic

robtillaart

updated the version to 0.1.04
+ getAverage(uint8_t nMedians)   (thanks Sembazuru!)
+ getCount()
+ getSize()
+ #ifdef for keeping the lib small/

also updated the template version

code can be found on playground - http://playground.arduino.cc//Main/RunningMedian -
and on github - https://github.com/RobTillaart/Arduino/tree/master/libraries -
Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Sembazuru


updated the version to 0.1.04
+ getAverage(uint8_t nMedians)   (thanks Sembazuru!)
+ getCount()
+ getSize()
+ #ifdef for keeping the lib small/

also updated the template version

code can be found on playground - http://playground.arduino.cc//Main/RunningMedian -
and on github - https://github.com/RobTillaart/Arduino/tree/master/libraries -



Looks and works good. I decided to go ahead and use getCount and getSize in my current application as an excellent example of a use for do ... while structure. (I don't think I've run across such a clear cut application of the "execute loop contents once, then loop back if conditional is true" before.) I was getting tired of having to throw away the first 18 lines of diagnostics when making graphs because of invalid data and calculations until the array filled.

I just noticed something that we both missed. I based my method suggestion on fitting into version 0.1.02 (0.1.03 never made it to the playground so I didn't know about it), before you had the _sorted flag (I really like that flag). So, unfortunately, the new overload to getAverage() will always sort instead of checking the _sorted flag first. Just a minor oversight.

Also feel free to add the following keywords.txt files to github and the playground page for proper syntax highlighting in the programs that use it (like the official IDE).

For the regular (cpp with h) version:
Code: [Select]

#######################################
# Syntax Coloring Map for RunningMedian
#######################################

#######################################
# Datatypes (KEYWORD1)
#######################################

RunningMedian KEYWORD1

#######################################
# Methods and Functions (KEYWORD2)
#######################################

add KEYWORD2
clear KEYWORD2
getMedian KEYWORD2
getAverage KEYWORD2
getHighest KEYWORD2
getLowest KEYWORD2
getSize KEYWORD2
getCount KEYWORD2

#######################################
# Constants (LITERAL1)
#######################################


For the template version:
Code: [Select]

#######################################
# Syntax Coloring Map for RunningMedian
#######################################

#######################################
# Datatypes (KEYWORD1)
#######################################

RunningMedian KEYWORD1

#######################################
# Methods and Functions (KEYWORD2)
#######################################

add KEYWORD2
clear KEYWORD2
getMedian KEYWORD2
getAverage KEYWORD2
getHighest KEYWORD2
getLowest KEYWORD2
getSize KEYWORD2
getCount KEYWORD2
getStatus KEYWORD2

#######################################
# Constants (LITERAL1)
#######################################
OK KEYWORD1
NOK KEYWORD1


P.S. Thanx for the kudos. 8)
http://www.catb.org/jargon/html/I/I-didn-t-change-anything-.html

robtillaart

#32
Oct 18, 2013, 07:29 pm Last Edit: Oct 18, 2013, 07:40 pm by robtillaart Reason: 1
The template version has
OK == 0 == false
NOK==1 == true

a bit counterintuitive ...

I'll add/update the keywords file - I use those not so much as at some moment all words get highlighted :)

update: added one keywords.txt file as the template version could be used for both versions
Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

int2str

A few random things I noticed:

1. You're using anonymous parameters in the header file and provide no documentation. That makes the library harder to use than it should be. Usually I would document the header file to make things more clear (see http://forum.arduino.cc/index.php?topic=188578.0)

2. The library is allocating 19xfloat no matter how many elements are to be stored. Would be nice to make that dynamic/templated to avoid memory allocation for nothing.

3. If less than the desired amount of values have been added, the median returns 0 or the wrong value. Would be nice if it adjusted the internal "fill level" as you go.

robtillaart

Hi int2str,

Thanks for this valuable feedback, I appreciate

(1) you have a point - Is easy to improve so will be done asap.
+ size in the constructor ()
+ nMedians in the getAverage()

(2) using Dynamic memory malloc() etc  had a lot of problems in the past. See discussions about String class. However this class will be far less dynamic in its use than the String class in most programs I assume. But that is an assumption (read risk). There are 3 options. Allocate fIxed number (current implementation), allocate dynamically (your proposal) or an #ifdef so people can choose (make no choice).
I also "hate" to have 2 lists of floats, the list in time order and the size-sorted one. I am thinking about reducing the sorted list to a list of indices of the time ordered list. Bytes would be sufficient reducing the size with 19x3 bytes [less RAM] on the expense of a bit more code [FLASH].
For now I leave it as it is as code behaves stable, but it is on my TODO list to make improvements here.

(3) that's a known bug,  :smiley-red: needs to be fixed, no discussion about that. Highest prio.

Thanks again for your feedback, think I have a job todo ;)
Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

int2str

(2) using Dynamic memory malloc() etc  had a lot of problems in the past. See discussions about String class. However this class will be far less dynamic in its use than the String class in most programs I assume. But that is an assumption (read risk). There are 3 options. Allocate fIxed number (current implementation), allocate dynamically (your proposal) or an #ifdef so people can choose (make no choice).


If it's templated, you can have a fixed sized array using a template parameter.

Quote
(3) that's a known bug,  :smiley-red: needs to be fixed, no discussion about that. Highest prio.


Should be as simple as using _cnt/2 to find your median instead of _size/2. Also sort only to _cnt, not to _size.

robtillaart

(3) I'm checking the code but it seems that I had already fixed it before. I recall there was a this bug.
(see playground version 0.1.04)

getMedian() already uses _cnt/2 and the sort routine does also use _cnt    or did I miss a  something?
which version do you use?

BTW during checking I found a bug in the swap routine of sort(), there I used a long for a temp variable instead of a float !
maybe this caused the errors you saw. Has been in it since the first version :( When using it for raw analogReads it would go unnoticed as all numbers in the list would be longs. Performance may improve now as it removes two conversions :)


Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

robtillaart

updated the version to 0.1.05 on github - https://github.com/RobTillaart/Arduino/tree/master/libraries - 

+ added comments in the header file
+ removed the default constructor, user should explicit define size
+ created a destructor
+ added support for dynamic allocated internal buffers (EXPERIMENTAL! not tested extensively)
  user can select for now.
+ fixed bug in sort routine (swap)
+ patched test sketches

on the playground only patched the bug in the sort routine,
I want to do more testing and upgrade thereafter (will be a 0.1.06 version)
Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Sembazuru


updated the version to 0.1.05 on github - https://github.com/RobTillaart/Arduino/tree/master/libraries - 

+ added comments in the header file
+ removed the default constructor, user should explicit define size
+ created a destructor
+ added support for dynamic allocated internal buffers (EXPERIMENTAL! not tested extensively)
  user can select for now.
+ fixed bug in sort routine (swap)
+ patched test sketches

on the playground only patched the bug in the sort routine,
I want to do more testing and upgrade thereafter (will be a 0.1.06 version)


Don't forget before 0.1.06 to change the sort(); line in the new getAverage (line 108 in v0.1.05 on github) to if (_sorted == false) sort(); so it will only sort if needed instead of every time.

I've been thinking about the issue with an even sized arrays. My understanding about medians from an even number of data points is it should be the mean of the two middle sorted data points. If you want to take this into consideration, what is the best way to determine an even number of data points from and odd number of data points? While _cnt%2 would tell you if it is even or odd, but there are simpler methods for these simple processors. Because the datatype of _cnt is u8int_t, divide by 2 then multiply by 2 and compare to the original. If _cnt is odd, the 0.5 from the division will be stripped so the multiplication won't return the original. Two simple integer math operations which are essentially a shift right followed by a shift left. Very few clock cycles. Or, just look at the least significant bit of _cnt, if it is high then the number is even. (I know in assembler that is one very short (clock cycles) opcode, but I'm not sure how the compiler handles the bitRead() function...)
http://www.catb.org/jargon/html/I/I-didn-t-change-anything-.html

Sembazuru


[snip...]
Or, just look at the least significant bit of _cnt, if it is high then the number is even.
[snip...]


I'm an idiot... if lsb of _cnt is set then the number is odd. (Head-smack)  :smiley-roll-sweat:
http://www.catb.org/jargon/html/I/I-didn-t-change-anything-.html

robtillaart

Quote
Don't forget before 0.1.06 to change the sort(); line in the new getAverage (line 108 in v0.1.05 on github) to if (_sorted == false) sort(); so it will only sort if needed instead of every time.


is already in it (hope to publish it today)  busy with several tests and some other code
Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

robtillaart

#41
Oct 19, 2013, 01:17 pm Last Edit: Oct 19, 2013, 01:26 pm by robtillaart Reason: 1
@rkaul
I worked yesterday through a lot of runningMedian code and found an optimization of your already blazingly fast version.

The trick is that I replaced bubble sort with insertion sort which is superior if arrays are nearly sorted.
Code: (FastRunningMedian.h) [Select]

//
//    FILE: FastRunningMedian.h
//  AUTHOR: rkaul   
// PURPOSE: RunningMedian library for Arduino
// VERSION: 0.1.01
// HISTORY:
// 0.1.00 rkaul initial version
// 0.1.01 rob.tillaart -> insertion sort
//
// Released to the public domain
//
// Remarks:
// This is a lean but fast version.
// Initially, the buffer is filled with a "default_value". To get real median values
// you have to fill the object with N values, where N is the size of the sliding window.
// For example: for(int i=0; i < 32; i++) myMedian.addValue(readSensor());
//
// Constructor:
// FastRunningMedian<datatype_of_content, size_of_sliding_window, default_value>
// maximim size_of_sliding_window is 255
// Methods:
// addValue(val) adds a new value to the buffers (and kicks the oldest)
// getMedian() returns the current median value
//
//
// Usage:
// #include "FastRunningMedian.h"
// FastRunningMedian<unsigned int,32, 0> myMedian;
// ....
// myMedian.addValue(value); // adds a value
// m = myMedian.getMedian(); // retieves the median
//
#ifndef FastRunningMedian_h
#define FastRunningMedian_h

#include <inttypes.h>

template <typename T, uint8_t N, T default_value> class FastRunningMedian {

public:
   FastRunningMedian() {
       _buffer_ptr = N;
       _window_size = N;
       _median_ptr = N/2;

       // Init buffers
       uint8_t i = _window_size;
       while( i > 0 ) {
           i--;
           _inbuffer[i] = default_value;
           _sortbuffer[i] = default_value;
       }
   };

   T getMedian() {
       // buffers are always sorted.
       return _sortbuffer[_median_ptr];
   }

   
   void addValue(T new_value) {
       // comparision with 0 is fast, so we decrement _buffer_ptr
       if (_buffer_ptr == 0)
       _buffer_ptr = _window_size;
       
       _buffer_ptr--;
       
       T old_value = _inbuffer[_buffer_ptr]; // retrieve the old value to be replaced
       if (new_value == old_value)  // if the value is unchanged, do nothing
       return;
       
       _inbuffer[_buffer_ptr] = new_value;  // fill the new value in the cyclic buffer
       
       // search the old_value in the sorted buffer
       uint8_t i = _window_size;
       while(i > 0) {
           i--;
           if (old_value == _sortbuffer[i])
           break;
       }
       
       // insert in _sortbuffer
       if (new_value > old_value)
       {
           uint8_t j = i+1;
           while (j < _window_size && new_value > _sortbuffer[j])
           {
               _sortbuffer[j-1] = _sortbuffer[j];
               j++;
           }
           _sortbuffer[j-1] = new_value;
       }
       else
       {
           uint8_t j = i-1;
           while (j > 0 && new_value < _sortbuffer[j])
           {
               _sortbuffer[j+1] = _sortbuffer[j];
               j--;
           }
           _sortbuffer[j+1] = new_value;
       }

       // i is the index of the old_value in the sorted buffer
       // _sortbuffer[i] = new_value; // replace the value
       // if (new_value > old_value) {
       // //  if the new value is bigger than the old one, make a bubble sort upwards
       // for(uint8_t p=i, q=i+1; q < _window_size; p++, q++) {
       // // bubble sort step
       // if (_sortbuffer[p] > _sortbuffer[q]) {
       // T tmp = _sortbuffer[p];
       // _sortbuffer[p] = _sortbuffer[q];
       // _sortbuffer[q] = tmp;
       // } else {
       // // done ! - found the right place
       // return;
       // }
       // }
       // } else {
       // // else new_value is smaller than the old one, bubble downwards
       // for(int p=i-1, q=i; q > 0; p--, q--) {
       // if (_sortbuffer[p] > _sortbuffer[q]) {
       // T tmp = _sortbuffer[p];
       // _sortbuffer[p] = _sortbuffer[q];
       // _sortbuffer[q] = tmp;
       // } else {
       // // done !
       // return;
       // }
       // }
       // }
   }
   
private:
   // Pointer to the last added element in _inbuffer
   uint8_t _buffer_ptr;
   // sliding window size
   uint8_t _window_size;
   // position of the median value in _sortbuffer
   uint8_t _median_ptr;

   // cyclic buffer for incoming values
   T _inbuffer[N];
   // sorted buffer
   T _sortbuffer[N];
};

#endif
// --- END OF FILE ---


I also redid the testsketch to get first order statistics
Code: (test sketch) [Select]

#include "FastRunningMedian.h"
#include "RunningMedian.h"

unsigned int value = 0;

FastRunningMedian<unsigned int,32, 0> newMedian;
RunningMedian staticMedian = RunningMedian(32);

uint32_t totalfast;
uint32_t totalslow;

void setup ()
{
 while (!Serial) ;
 Serial.begin(115200);
 Serial.println("Starting....");

 for (int i=1; i<1000;i++)
 {
   test();
 }
 Serial.print("\n\ntotalfast=");
 Serial.print(totalfast/1000.0);

 Serial.print("\t\ttotalslow=");
 Serial.print(totalslow/1000.0);
 Serial.println();
};

void loop()
{
}

void test()
{
 unsigned int xmedian;
 unsigned int cmedian;

 // us measurement overhead is 4 us.
 value = random(0, 255);
 unsigned long loopstart = micros();
 newMedian.addValue(value);
 xmedian = newMedian.getMedian();
 unsigned long loopstop = micros();

 unsigned long fastdelay = loopstop - loopstart;
 totalfast += fastdelay;

 loopstart = micros();
 staticMedian.add(value);
 cmedian = staticMedian.getMedian();
 loopstop = micros();

 unsigned long slowdelay = loopstop - loopstart;
 totalslow += slowdelay;

 //  Serial.print("Loop FastTime(us)=");
 //  Serial.print(fastdelay);
 //  Serial.print(" SlowTime(us)=");
 //  Serial.print(slowdelay);
 //  Serial.print("  value=");
 //  Serial.print(value);
 //  Serial.print(" median=");
 //  Serial.print(xmedian);
 //  Serial.print(" excactmedian=");
 //  Serial.print(cmedian);
 //  Serial.println();

 //delay(100);
};


1000 add's and getMedians
bubble sort version: 51.03 usec
insert sort version:  42.52 usec  
The insert sort is also 20 bytes smaller in size

Please give it a try.
Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

robtillaart

Quote
I've been thinking about the issue with an even sized arrays. My understanding about medians from an even number of data points is it should be the mean of the two middle sorted data points. If you want to take this into consideration, what is the best way to determine an even number of data points from and odd number of data points?

While _cnt%2 would tell you if it is even or odd, but there are simpler methods for these simple processors. Because the datatype of _cnt is u8int_t, divide by 2 then multiply by 2 and compare to the original. If _cnt is odd, the 0.5 from the division will be stripped so the multiplication won't return the original. Two simple integer math operations which are essentially a shift right followed by a shift left. Very few clock cycles. Or, just look at the least significant bit of _cnt, if it is high then the number is even. (I know in assembler that is one very short (clock cycles) opcode, but I'm not sure how the compiler handles the bitRead() function...)

fastest is the test for oddness
bool odd = _cnt & 0x01;
==>
bool even = !(_cnt & 0x01);  // even = !odd;

[implementing even sized arrays correctly]
- if the elements are integers is the median also an integer? according to math it should be a float.
   ==> calculating the average is a float addition + division. + oddness test.
- with odd array size the code works after the first element is added.
  even arrays need at least 2 elements added.
in short it adds code complexity to make it right.
I'll leave this point open for after the 0.1.06 version.


Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

robtillaart

uploaded 0.1.06 version on github, - https://github.com/RobTillaart/Arduino/tree/master/libraries - 
+ faster sort (bubble sort with flag)
+ dynamic memory for array (option in .h file)
   (uses more flash and less RAM)
+ replaced sorted float array with an array of indices (less RAM)

please give it a try
Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

robtillaart

@rkail
some more tweaking, search the oldvalue with a binary search iso linear search
Code: (snippet to replace) [Select]

        // search the old_value in the sorted buffer
        uint8_t t = _window_size;
        uint8_t b = 0;
        uint8_t i = 0;
        while (t > b)
        {
            i = (b+t)/2;
            if (old_value == _sortbuffer[i]) break;
            if (old_value < _sortbuffer[i]) t = i;
            else b = i+1;
        }

        // uint8_t i = _window_size;
        // while(i > 0) {
            // i--;
            // if (old_value == _sortbuffer[i])
            // break;
        // }

1000 add's and getMedians  (UNO IDE1.54r)
original version: 51.03 usec
insert sort + binsearch version:  34.00 usec 
Rob Tillaart

Nederlandse sectie - http://arduino.cc/forum/index.php/board,77.0.html -
(Please do not PM for private consultancy)

Go Up