Go Down

Topic: How to deal with memory for large array manipulations on MCU (Read 160 times) previous topic - next topic

daanvV

Hello great minds of the internet,

For a project I am measuring acceleration during a time period of roughly 10- 15 seconds (sample rate 250 Hz),  and converting that to displacement, via double integration and filtering. Crucially, I am performing forwards and backwards filtering, meaning that (for the backwards filter) I can only filter offline, i.e. the whole array of acceleration/velocity values needs to be known. However this does mean that I then have (at least) two arrays of roughly 3000 floats each in memory, so that is 3000*2*4 bytes = 24 kBytes (which will be stored in RAM I guess).

The code below is a condensed and simplified version of the project.  The script does not run on an Uno (for obvious RAM size related reasons), but it does on a Due at array sizes of roughly 2500 elements and below (above this it starts printing ovf values, which are a sign of full RAM i guess?).

Now for my project I am looking for a MCU (doesn't have to be Arduino) that can handle the below math and has as low power consumption as possible, all other factors are not so relevant. Alternatively I could store the arrays on an external SD card, and then find a way to read and write simultaneously to the same file.

So my questions to you are:
1) What would you recommend, SD card or an MCU that has enough RAM? Which setup would have lower power consumption?
2) Considering that the Due has SRAM of 96 kB, why does it already fail when I use bigger arrays than those that have a combined total of roughly 24 kB? Surely the Due should be able to handle bigger arrays than that.
3) Can anyone think of any other way that might work? Changing the sampling rate, measurement duration and filter types aren't really an option.

Code: [Select]
int r;
const int lenF = 6;
float w[lenF];
//w = {0, 0, 0, 0, 0, 0};

/* Filtering */
const int lenAr = 3500;
float B_array[lenAr];

void setup(void) {
  Serial.begin(115200);
}


void loop(void) {    
    r = 0;    
    for (int c = 0; c < lenF; c++){
      w[c] = 0.0f;
    }

    while (r < lenAr){
      float x = (float) random(100);
      filter(x);
      B_array[r]  =  w[5];  //comment for DEBUG
      r = r + 1;
    }
    
    float bv = computeDisplacement();
    delay(10000);
}




float computeDisplacement(void) {
  float dt = 1.0/250.0;
  int l = 0;
  Serial.println("phase 1");
  
/* Step 1: backwards filter on acceleration */
  // Backwards filter on acceleration.

  float A_array[lenAr];

  for (int c = 0; c < lenF; c++){
    w[c] = 0.0f;
  }

  int m = lenAr;
  for (int n = 0; n < lenAr; n++){
    filter(B_array[m]);
    A_array[m] =  w[5]; // produces array
    m--;
    
  }

  Serial.println("phase 2--------------------------- Integration -----------------------------");
  
/* Step 2: integrate from acceleration to velocity */
  float a0, a1, v0, v1;
  a0 = a1 = v0 = v1 = 0.0f;
  
  for (int n = 0; n < lenAr; n++){
    a1          =   A_array[n];
    B_array[n] =   0.5*(a1 + a0)*dt + v0;
    a0          =   a1;
    v0          =   B_array[n];
    Serial.println(a1);
  }
  Serial.println("phase 3++++++++++++++++++ Forward velocity filter ++++++++++++++++++++++++++++++");


  /* Step 3: 2nd order band pass filter on velocity */
  
   // ------------ Forwards

  for (int c = 0; c < lenF; c++){
    w[c] = 0.0f;
  }

  for (int kb = 0; kb < lenAr; kb++){
    filter(B_array[kb]);
    A_array[kb] =  w[5];
    Serial.println(A_array[kb],3);

  }

  Serial.println("phase 4******************* Backward velocity filter****************************************");

  // -------------- Backwards
  int p = lenAr;

  for (int c = 0; c < lenF; c++){
    w[c] = 0.0f;
  }

  for (int n = 0; n < lenAr; n++){
    filter(A_array[p]);
    B_array[p] = w[5];
    Serial.println(B_array[p],4); //Serial.print(", "); Serial.println(dt,4);
    p--;
    

  }
  // destroy vFf_array

  /* Step 4: integrate from velocity to deflections */
  float dMin, dMax, d1, d2;  

  dMin = 0.0f;
  dMax = 0.0f;
  d1   = 0.0f;
  d2   = 0.0f;
  Serial.println("phase 5############################ Displacement######################");

  for (int q = 0; q < lenAr; q++){
    d2        =   d1 + dt * A_array[q]*1000.0; // convert to mm  
    d1        =   d2;
    
    if (d2 > dMax){
      dMax = d2;
    }
    if (d2 < dMin){
      dMin = d2;
    }
    Serial.println(d2);
  }

  Serial.println("phase 6");

    /* Reset array value to zero to avoid any issues from arising */
  for (int n = 0; n < lenAr; n++){
       B_array[n] = 0.0f;
  }
  
  /* Step 5: determine the min & max displacement */
  float bv = dMax - dMin;
  Serial.println(dMax);
  Serial.println(dMin);
  Serial.println(bv);

  return bv;
}



void filter(float x){

  w[0] = w[1];
  w[1] = w[2];
  w[2] = w[3];
  w[3] = w[4];
  w[4] = (3.0981723490419175984e-3 * x)
      + (1.02785745857138589923 * w[0])
      + (0.06310578639809705237 * w[1])
      + (8.80613296663263778186 * w[2])
      + (-10.0892347548666875140 * w[3]);

  w[5] = (w[0] + w[4]) - 3.0 * w[2];

  return;
}



aarg

You didn't give any details of the failure on the Due, just "fail". In order for anyone to answer, they would need to know what the outcome is from running the code. Most people don't even have a Due to test with.

As for your first question, it is obviously implementation dependant, for example nobody can guess how fast you need it done. Similarly, you talk about power consumption but give no figures.

More detail please...
  ... with a transistor and a large sum of money to spend ...
Please don't PM me with technical questions. Post them in the forum.

jremington

The Teensy series from pjrc.com is recommended for projects that require more memory, are math intensive, high speed and higher floating point accuracy than the standard Arduino.

The Teensy 4.0 and 4.1 run at 600 MHz and have a few hundred K of RAM (cost less than $20).

aarg

Well, if we're going in that direction, you should also consider the new Robodyne and WeAct STM32F401 boards in the Blue Pill form factor that are out now. There are also new boards with the STM32H7 series, they are extremely powerful. The 401 is ridiculously cheap for what it does, the H7 boards are pricier but still not too bad. ESP32 is an option too, but I think it won't be conservative with power, by default. You would have to engage some software controlled power management. Pick a processor that has an FPU.

The Due memory is in two banks, 32 and 64k. Your arrays should still fit in there, I suggest investigating further before giving up on the Due. Again, you didn't say whether it compiled? ran? crashed? produced incorrect results?
  ... with a transistor and a large sum of money to spend ...
Please don't PM me with technical questions. Post them in the forum.

Idahowalker

I run a ESP32 WROVER model where I create an array sized to 4MB to save captured data to for playback. Perhaps a ESP32 WROVER can suit your requirements?

If you need more RAM there are ESP32 models with 8MB and 12MB of ram but the Arduino IDE does not work with accessing the extra ram.
Receiving partial information does not help me help you and wastes my time.

Robin2

For a project I am measuring acceleration during a time period of roughly 10- 15 seconds (sample rate 250 Hz),  and converting that to displacement, via double integration and filtering. Crucially, I am performing forwards and backwards filtering, meaning that (for the backwards filter) I can only filter offline, i.e. the whole array of acceleration/velocity values needs to be known. However this does mean that I then have (at least) two arrays of roughly 3000 floats each in memory, so that is 3000*2*4 bytes = 24 kBytes (which will be stored in RAM I guess).
What is the purpose of doing this? It's much easier to help when you provide a full description of the project so we can understand the question in its proper context.

I'm wondering why you don't just copy the data to a PC and do all the calculations there - perhaps with a Python program. Python has very extensive maths and statistics libraries. And a PC has virtually unlimited memory so you don't need to waste programming brain-power on memory management. If size is an issue then a Raspberry PI is a PC on a small board.

...R
Two or three hours spent thinking and reading documentation solves most programming problems.

ard_newbie

I see a first issue with w[] declaration: In void filter(float x) calculations are made with 20 decimal floats.....

With a double (means a double float, would use 8 bytes) you could expect a 15 decimals accuracy only.

jremington

Quote
calculations are made with 20 decimal floats.
Not a problem. The compiler ignores the extra digits.

It is also unlikely that the accuracy of single precision versus double precision float calculations would make a difference in this particular case, because typical consumer grade sensors is so noisy that math errors are minuscule in comparison.

ard_newbie

I suspect a compiler issue because if I cut "float computeDisplacement(void)" into several functions (e.g. corresponding to the different phases),  there is no more "ovf" outputs.


Maybe the compiler makes several copies of arrays (hence exceeding 96 KB) when there is only one
computeDisplacement() function.


Moreover, double should be prefered to float either for B_array[] or A_array[].

Go Up