memcpy of a uint32_t from a sensor throttling loop speed - why?

Hi everyone,

Long time listener, first time caller. I’m using a Mega 2560 to communicate with a mixture of analog (i.e. measured via analogRead()) and I2C based sensors. Sampled data is placed in a 16 byte array, then written to an SD card once a 512 byte buffer is completely filled. The problem I’m running into is that when I collect data from a MAX30102 pulse oximeter using

measure.pox = pulseOx.getIR();

from my code below, the cycle time of my collectHR() loop drops to about 20 mS. If I preallocate this spot by using the line directly below it (storing a constant instead of reading a new uint32_t each loop), my cycle time is about half a millisecond. What’s confusing to me is that if I cast the uint32_t into a string using dtostrf:

dtostrf(pulseOx.getIR(),3,0,pOxBuffer); // read pulseOx, convert it into a 3 byte string
// in my previous code, all sensors were read this way, then I concatenated all the buffers
// together and wrote the resulting, longer array to my SD file. Cycle time averages 8 mS to read and store all sensors.

it only takes about 1.2 mS to actually read a sample from the MAX30102. It seems to me that by favoring a struct over a string (from an earlier iteration of this code that wrote all data as a txt file instead of a bin file) so that I can write binary to my SD card, I’m absolutely throttling my speed. Shouldn’t working bytewise be more efficient than working with a string? What is happening to the other 18 mS that occurs between reading the unsigned long and placing it into my byte buffer, buffer1? Implementing this code using string arrays instead of a struct, I was able to run at approximately 125 Hz. Now I’m at roughly 50 Hz. I’d appreciate any insight here. Relevant code shown below:

#include <Wire.h>
#include "MAX30105.h"
#include <SPI.h>
#include <SdFat.h>
#define moisture0 A0 // Upper back sensor analog pin
#define moisture1 A7 // Trunk sensor analog pin
#define ECGpin A3 // ECG analog pin

SdFat SD; // replace SD with SDfat.
File sdLog; // placeholder to create datalogger


struct meas // a 16 byte structure to hold all of our measurements
{
  unsigned long mils;
  int ECG;
  uint32_t pox;
  int tempInt;
  int m1;
  int m2;
};
struct meas measure; // create an instance of the above structure

byte buffer1[512];
byte *measureLoc = (byte *) &measure; // to access measure structure as bytes
char fileName[] = "WIHMSlog.bin";

void setup {
    Serial.begin(230400);
}


void loop() {
sdLog = SD.open(fileName, O_WRITE | O_CREAT | O_AT_END); // Create file for the datalogging and verify its existance
collectHR();
sdlog.close()
}


void collectHR() {
  unsigned long loopCount = 0;
  int buffLen = 0; // starting position in buffer
  int buffSize = 16;

  while (loopCount < 3200) { // some multiple of 32 (since 512 byte/sd block divided by 16 bytes per loop cycle = 32 cycles per sd write cycle)

    measure.ECG = analogRead(ECGpin); // read ECG and stick the int into measures
    measure.pox = pulseOx.getIR();
    //measure.pox = 5555; // Placeholder value
    measure.mils = micros();
    measure.m1 = analogRead(moisture0);
    measure.m2 = loopCount; // just a test to ensure this is actually iterating in the card
    if (buffLen == 512) { // SD buffer is full, time to write!
      sdLog.write(buffer1, 512);
      buffLen = 0;
    }
    memcpy(buffer1 + buffLen, measureLoc, buffSize); // place the 16 byte data structure into the buffer
    buffLen = buffLen + buffSize; // increase the index size in the array

    loopCount++;
    Serial.println(micros());
  }

}

My first thought was that an unsigned long was just more problematic than other datatypes when using memcpy since it’s longer, but I can store micros() as an unsigned long without any effective slowdown (a handful of microseconds of difference). My second thought was that the issue was from reading the sensor data directly into my struct rather than a placeholder variable first, but I don’t see why this would make any difference. How can it be that writing the same data as an ASCII string into a txt file is twice as fast as writing binary to a bin file?

Bhewgill:
sdlog.close()

It’s a small thing but, should there be a ‘;’ after that statement?

  • it seems like the memcpy should be before buffLen is increased by buffSize otherwise your first memcpy to buffer1 starts at 16!
  • I’m not sure about this compiler/architecture but some compilers pad structures to make memory accesses more efficient. You might want to print sizeof(meas) to ensure it is 16 bytes.

Sorry I’m not at a place where I can compile.

An array of structures instead of an array of bytes strikes me as a much better choice.

dougp:
It's a small thing but, should there be a ';' after that statement?

You're absolutely right, good catch. The semicolon is there in my compiled code, I just missed it copying things over. I've edited it into the original post.

ToddL1962:

  • it seems like the memcpy should be before buffLen is increased by buffSize otherwise your first memcpy to buffer1 starts at 16!
  • I’m not sure about this compiler/architecture but some compilers pad structures to make memory accesses more efficient. You might want to print sizeof(meas) to ensure it is 16 bytes.

Sorry I’m not at a place where I can compile.

You're also absolutely right! I had initially intended to use the 16 bytes of zeros as a way to hone in on the execution of the collectHR() loop in postprocessing. In hindsight, I think this is just a waste of memory. I've moved the bufflen increment to just after the call to memcpy. sizeof(meas) is indeed 16 bytes; I was concerned about that too and should probably change the hardcoded 16 to a sizeof() call just in case I change the struct in the future without thinking about it.

Interesting, I'll try to give this one a go when I'm near my hardware again. So you're thinking fill a struct[32] with 16 instances of meas, then only call memcpy once? Or are you suggesting to store "loopCount" number of structs, and write one massive block of 16*loopCount bytes to the SD when collectHR() concludes?

Thanks for the input guys, it's great to have different perspectives on this than just mine.

Don't bother with memcpy.

Make the array (close to) the target size for a write. Optionally wrapped in an outer struct for padding. The compiler can automate the array count calculation and the padding size calculation.

Cast the array in the call to write.

I’ve altered collectHR() to populate an instance of a struct[32]. The new code for collectHR() is shown below:

void collectHR() {
  int loopCount = 0;
  unsigned int buffLen = 0; // starting position in buffer
  int buffSize = 16;
  int bufferLocation = *buffer1Loc;

  while (loopCount < 320) { // some multiple of 32 (since 512 byte/sd block divided by 16 bytes per loop cycle = 32 cycles per sd write cycle
  for (int i=0; i<32;i++) {
    measure[i].ECG = analogRead(ECGpin); // read ECG and stick the int into measures
    measure[i].ECG = 567; // read ECG and stick the int into measures
    measure[i].pox = pulseOx.getIR();
    //measure[i].pox = 5555; // Placeholder value
    //dtostrf(pulseOx.getIR(),3,0,pOxBuf); // read pulseOx, convert it into a 3 byte string
    //Serial.print(pulseOx.getIR());
    //Serial.print("  ");
    
    measure[i].mils = micros();
    measure[i].m1 = analogRead(moisture0);
    measure[i].m2 = loopCount; // just a test to ensure this is actually iterating in the card
    measure[i].tempInt = measure[0].tempInt;
  }
    
//    if (buffLen == 512) { // SD buffer is full, time to write!
//      sdLog.write(buffer1, 512);
//      //sdLog.flush(); //need this or no?
//      buffLen = 0;
//    }
//    memcpy(buffer1, measureLoc, 512); // place the 16 byte data structure into the buffer
//    buffLen = buffLen + buffSize; // increase the index size in the array
    loopCount++;
    Serial.println(micros());
  }

  sdLog.write(measure, sizeof(measure));

  //rtcTime(); // remove this and the next 3 lines once you know how fast you are
  //Serial.print("HR Close ");
  //Serial.println(rtcBuf);
}

But this code executes an order of magnitude slower! I’m calculating elapsed loop time by taking the Serial.println(micros()) output from the monitor and dividing the difference between each line by 32. Each loop iteration now takes 100 milliSeconds!

Bummer.

I've been playing around with this issue off and on for the last couple of weeks. The final issue was in the setup of the MAX30102. There are three operation modes: 1) utilizes only the red LED, 2) utilizes the red and the IR LED to calculate out the effect of temperature changes, and 3) utilizes red, IR, and green. The MAX30102 does not have the green LED hardware at all, so I set the mode to 2 in order to bypass calls to nonexistent hardware. This was the final problem. In mode 2, a cycle of collectHR() takes 100 mS. In mode 3, a cycle of collectHR() takes 5 mS. I assumed this to be outside the scope of the issue, so I didn't include the setup parameters in my initial question. Go Figure.

Thanks for the help, everyone.