Can I have an Array of Rolling Averages for data logging?

tkmvs · October 13, 2023, 1:46pm

Just joined and hope I'm posting this in the correct area/giving enough info etc! I'm completely self-taught with very little knowledge beyond using the simple commands...

I've simplified the question as I'm struggling to convey it clearly - appreciating this might also limit the potential helpfulness of answers...

Simplistically, I'm trying to record and display the mean, max, min and SD of a measured value. But this measured value needs to be recorded at the coordinate of two other inputs. These two other inputs change independently and constantly. i.e. I'd like an array of x against y and record a running average of z (ideally plus the other data) at the various x - y integer values. x and y can be from 1 to 5 so an array of 25 pieces of data.

I first thought I could 'simply' create 25 running averages using RunningAverage.h library, one for each x-y 'coordinate'.

Unfortunately I can't see an easy to way to use this with out it being very clumsy - very difficult to add data into and also ultimately print the data to the serial port.

Is there anyway I can set this up as an array instead?

Or are there any running average libraries that work for an array of data? I've tried searching the internet but the results only seem to come back as how to use an array to calculate a running average, not having an array of running averages!

Thanks!

kolaha · October 13, 2023, 1:56pm

int Array[5][5][4];
Array[5][5][0]=32767;
Array[5][5][1]=-32768;
Array[5][5][2]=0;
Array[5][5][3]=SD;// whatever SD is

UKHeliBob · October 13, 2023, 1:59pm

Welcome to the forum

I am not quite clear on what problem you are trying to solve, but would it help if the array could hold variables of different types instead of all data items needing to be of the same type. Perhaps something like this dummy code

dataLayout data[int x, int y, float runningAverage, int max, int min, float standardDeviation]

gfvalvo · October 13, 2023, 2:02pm

Standard Deviation. The square root of the variance.

toddnz · October 13, 2023, 2:04pm

This library may or may not be helpful: GitHub - RobTillaart/RunningMedian: Arduino library to determine the running median by means of a circular buffer.

kolaha · October 13, 2023, 2:04pm

const byte dim = 5;
struct Data {
  int min;
  int max;
  float avg;
  float SD;
} myData[dim ][dim ];

void setup() {
  Serial.begin(115200);
  myData[0][0].min = 1;
  myData[4][4].avg = 1.999;
  for (byte i = 0; i < dim ; i++) {
    for (byte j = 0; j < dim ; j++) {
      Serial.print(myData[i][j].avg, 2);
      Serial.print('\t');
    }
    Serial.println();
  }
}

void loop() {}

DVDdoug · October 13, 2023, 2:04pm

The Smoothing Example uses a simple moving-average array. (Actually, just the array pointer moves, not the array contents).

...I have an application that uses this, and every time I update the array I find & save the peak as well as the average. I have two arrays of 20 elements each, updated & re-calculated once per second.

Just keep in mind that all of this calculation will take time and that might limit how fast you can sample the data.

tkmvs · October 13, 2023, 3:38pm

Thanks for all the replies!

I think I'm probably asking for something that can't be done with a standard library and I don't think I'm doing a very good explanation ether of what I'm looking to do!

Ignoring the max/min/sd so as not to confuse things further - what I'd like is a 2 dimensional array of data with each value being itself a rolling average of some data.
e.g. Array[x][y]
Screenshot 2023-10-13 161918

(hope that inserted screen shot works!)

Where RA1 through to RA25 are all rolling averages of the value that's been measured (z).

The data could be coming in like the below this:
x=1, y=1, z=13
x=4, y=5, z=12
x=3, y=3, z=13
x=3, y=4, z=11
x=1, y=1, z=15
(etc)
so the z value would be added to the rolling average for each x - y 'coordinate' if that makes sense? i.e. at x=1, y=1 there have been two z values of 13 and 15 so the rolling average for this 'cell' would be 14.

I think the closes I can see of a way to do this is a 3 dimensional array (?) i.e. Array[5][5][5]; where the data is manually 'put' into that 3rd element in a sequential way such that the rolling average can then be calculated without using a library.

So using the above data and x=1 y=1 example:
Array[1][1][1] would have the value of 13 and Array[1][1][2] would have the value of 15 so I'd then have to calculate the rolling average (somehow) at each 'cell' for e.g. Array[1][1][n] where 'n' is the sample size (in this instance 5).

It would be nice to do this on the Arduino but I might be easier to simple save all the raw data into a CVS file and do it in Excel! Down side is, I'd have to take the data away to analyse it and not be able to see it 'live'.

tkmvs · October 13, 2023, 3:44pm

Thanks for the code but to be completely honest, you've totally lost me with it from the second line down!

UKHeliBob · October 13, 2023, 3:48pm

It seems that we are thinking along the same lines

kolaha · October 13, 2023, 4:07pm

"struct" you can imagine as a collection of variables, this type we named Data.
and than we declaring a two dimensional array of such collections named myData.
for each array cell is like a third dimension of array, but stored variables are not identical types. even Bits are allowed.

J-M-L · October 13, 2023, 4:11pm

if you run on a 32 bit platform you could use the vector class to make a circular buffer

#include <vector>

template <typename T>
class CircularArray {
  public:
    CircularArray(size_t maxSize) : maxCnt(maxSize), buffer(maxSize), front(0), rear(0), count(0) {}

    void push(const T& value) {
      if (count == maxCnt) {
        // Buffer is full, overwrite the oldest element
        front = (front + 1) % maxCnt;
      }

      buffer[rear] = value;
      rear = (rear + 1) % maxCnt;
      count = std::min(count + 1, maxCnt);
    }

    size_t currentSize() const {
      return count;
    }

    void print() const {
      if (count == 0) {
        Serial.print("Empty ");
      } else {
        size_t index = front;
        for (size_t i = 0; i < count; ++i) {
          Serial.print(buffer[index]);
          Serial.write(' ');
          index = (index + 1) % maxCnt;
        }
      }
    }

    double average() const {
      double result = 0;
      if (count == 0) return 0; // arbitrary

      size_t index = front;
      for (size_t i = 0; i < count; ++i) {
        result += buffer[index];
        index = (index + 1) % maxCnt;
      }
      return result / count;
    }

  private:
    size_t maxCnt;
    std::vector<T> buffer;
    size_t front;
    size_t rear;
    size_t count;
};

then you decide how deep your running average should be say we keep the last 10 elements and the type of data you keep in there (say double) and you create your 2D array of the circular buffers

constexpr size_t Rows = 5;
constexpr size_t Columns = 5;
constexpr size_t N = 10; // Maximum number of elements to keep for the rolling average

// Create a 2D array of CircularArray objects
CircularArray<double> circularArray2D[Rows][Columns] = {
    {{N}, {N}, {N}, {N}, {N}},
    {{N}, {N}, {N}, {N}, {N}},
    {{N}, {N}, {N}, {N}, {N}},
    {{N}, {N}, {N}, {N}, {N}},
    {{N}, {N}, {N}, {N}, {N}}
};

note that you could decide to have some elements in your 2D array registering more values

now when you want to save an incoming value into the array you use push()

circularArray2D[y][xl].push(aValue);

and when you want to compute the average for a given (x,y) entry you call

circularArray2D[y][xl].average();

here is an example with one circular buffer

click to see the code

#include <vector>

template <typename T>
class CircularArray {
  public:
    CircularArray(size_t maxSize) : maxCnt(maxSize), buffer(maxSize), front(0), rear(0), count(0) {}

    void push(const T& value) {
      if (count == maxCnt) {
        // Buffer is full, overwrite the oldest element
        front = (front + 1) % maxCnt;
      }

      buffer[rear] = value;
      rear = (rear + 1) % maxCnt;
      count = std::min(count + 1, maxCnt);
    }

    size_t currentSize() const {
      return count;
    }

    void print() const {
      if (count == 0) {
        Serial.print("Empty ");
      } else {
        size_t index = front;
        for (size_t i = 0; i < count; ++i) {
          Serial.print(buffer[index]);
          Serial.write(' ');
          index = (index + 1) % maxCnt;
        }
      }
    }

    double average() const {
      double result = 0;
      if (count == 0) return 0; // arbitrary

      size_t index = front;
      for (size_t i = 0; i < count; ++i) {
        result += buffer[index];
        index = (index + 1) % maxCnt;
      }
      return result / count;
    }

  private:
    size_t maxCnt;
    std::vector<T> buffer;
    size_t front;
    size_t rear;
    size_t count;
};

constexpr size_t N = 5; // Maximum number of elements to keep for the rolling average
CircularArray<double> circularArray(N);

void setup() {
  Serial.begin(115200);
  for (int i = 1; i <= 10; ++i) {
    circularArray.push(i);
    circularArray.print();
    Serial.printf(" => average = %f\n", circularArray.average());
  }

}

void loop() {}

tkmvs · October 13, 2023, 4:12pm

Sorry, I'm not following how that helps?

alto777 · October 13, 2023, 4:29pm

Another means of smoothing incoming data is to use a leaky integrator.

Google all the theory you want, it comes down to

   average = 0.9 * average + 0.1 * newReading;

So take most of the old average and add in a little of the new reading.

Use any 0.9 and 0.1 that add up to 1.0. Generally

   average = alpha * average + (1.0 - alpha) * newReading;

where alpha is set between 0.0 and 1.0.

Making alapha like 0.7 means the old average hangs around less, new values are more important so it would take fewer steps to converge on a new value coming in that was held constant.

So you don't need a store N values and take the average of them, just maintain the average(s) according to the leaky integrator equation.

Try it on your data.A graph of it can show you it smoothing and converging and so forth.

a7

tkmvs · October 14, 2023, 6:00pm

Thanks for all the replies on this...

I've just had a play with the "leaky integrator" method in Excel and that looks like the way forward! It doesn't give me the max/min/SD but those values are only to give me initial confidence that the system is 'under control' - I've used RS232 DataLogger in the past so planning to use this again to assist here.

6v6gt · October 14, 2023, 6:04pm

I'd agree and say that if you are logging data then, in principle, log direct values where possible and leave any treatment such as averaging, smoothing or other processing etc. to an offline process. That way you can, if necessary, modify the processing algorithm and re-analyse any collected data.

J-M-L · October 14, 2023, 6:13pm

the code with the circularArray could

tkmvs · October 14, 2023, 6:49pm

Thanks but to be perfectly honest, I don't understand the example enough to implement it!

You've lost be with what a vector class is...

This is a good example of code where there seem to be a step-change in programming complexity that I can't get my understanding past. If its in Arduino Reference then I can generally work it out and/or search for examples on the internet. But there is so much stuff here that I don't recognise, it seems to be impenetrable to decipher! Is this where things become more based on C (or C++)?

I'm using a genuine Uno for this application (it's a good few years old so assuming it's an R3) and a cheap R3 for bench testing. I understand only the new R4 is 32-bit? But upgrading wouldn't be a problem...

david_2018 · October 14, 2023, 6:50pm

Is the sampling rate for each of the (x,y) coordinates independent of each other - in other words, at any given time, does each (x,y) coordinate have a number of samples for the z value that is independent of the number of samples for every other (x,y) coordinate?

Seems that you might need a circular buffer for each possible (x,y) coordinate.

tkmvs · October 15, 2023, 8:32am

I'm measuring three variables (x,y,z) with z being able to potentially vary the most. Some x-y 'coordinates' will be very common to occur and others may not even practically get used. If I have confidence that the z value at a each x-y location is doing what I expect, then I can simply use a rolling average or (looking better) leaky integrator to keep track of these values. I can then make changes to an output to bring the z value to where I want it to be. But if the z value is all over the place and has a high SD, then I know my control algorithm isn't working! I don't mined a few spurious values (I could pick up on the max/min) but would hope to be seeing a tight 'group' around the mean.

To give the 'real world' application, this is to record manifold pressure ('z') for a turbo-diesel engine where 'x' is the engine RPM and 'y' is the engine load.

Topic		Replies	Views
RunningAverage Class on playground (updated) Libraries	51	30530	May 6, 2021
Running Average function in a for loop Programming	32	28290	May 5, 2021
Using Average.h => rollingAverage Programming	16	5739	May 5, 2021
Array manipulation Programming	13	5903	May 5, 2021
initialize smoothing array with sensor data instead of zeros Programming	12	2989	May 5, 2021

Can I have an Array of Rolling Averages for data logging?

Related topics