Optimization of writing to sd card

I'm working with an Arduino-based datalogger, sampling three analog channels (from an accelerometer) and writing the value to a MicroSD card. I've been tasked with cranking the sampling rate up as high as possible in order to capture impact events, and while I'm currently at around 1000 Hz I think that it would be possible to sample somewhat faster than that.

At this point, my loop basically does three things: call analogRead() for three channels, iterate a counter, and write the values to the memory card. Since analogRead() should take about 0.1 ms, I believe that the major bottleneck lies in storing the data. When I remove the logfile.println() statement, I end up with about 0.5 ms per sample, which is reasonable.

I have tried three strategies in order to optimize my program:

  • Write all three values in separate print statements
  int16_t accX = 0;
  int16_t accY = 0;
  int16_t accZ = 0;
  uint16_t counter = 0;

  while(counter < 20000) {
    counter++;
    accX = analogRead(A0)-512;
    accY = analogRead(A1)-512;
    accZ = analogRead(A2)-512;

    logfile.print(accX);
    logfile.print(",");
    logfile.print(accY);
    logfile.print(",");
    logfile.println(accZ);

  }

This one takes about 30 s for 20,000 samples, so 1.5 ms per sample.

  • Calculate the square of the norm, so that I only have to store one value. I would then calculate the square root of this value when I'm post-processing the data
int16_t accX = 0;
  int16_t accY = 0;
  int16_t accZ = 0;
  int16_t normsq = 0;
  uint16_t counter = 0;

  while(counter < 20000) {
    counter++;
    accX = analogRead(A0)-512;
    accY = analogRead(A1)-512;
    accZ = analogRead(A2)-512;
    normsq = accX*accX+accY*accY+accZ*accZ;

    logfile.println(normsq);

  }

This one takes 22 s for 20,000 samples, so 1.1 ms per sample. The time it takes to calculate the norm seems to vary somewhat with the size of the values, and whether or not the values are constant from one measurement to the next. I'm also aware that I might want to use long rather than int16_t as in my example.

  • Used sprintf in order to write all three values to a char, and then write that char in a single line.
int16_t accX = 0;
  int16_t accY = 0;
  int16_t accZ = 0;
  uint16_t counter = 0;
  char buf[15];

  while(counter < 20000) {
    counter++;
    accX = analogRead(A0)-512;
    accY = analogRead(A1)-512;
    accZ = analogRead(A2)-512;
    sprintf(buf,"%d,%d,%d",accX,accY,accZ);

    logfile.println(buf);

  }

This strategy takes 21 s for 20,000 samples, so 1.05 ms/sample.

I'm not really used to embedded development and this level of optimization, but wouldn't it make sense to do bitwise concatenation, from 3*16 bits to one 48 bit variable, and then write that single variable? Then I would be able to split it's value to three binary strings in postprocessing and retrieve my three 16-bit ints. I'm thinking that it might be faster than taking the route via sprintf.

Does this line of thinking make sense at all? In that case, how would I go about doing so in an efficient way? Is there a better way of optimizing the data storage?

You should post all your code.

One good way of improving the sample rate is saving the data to a buffer and only write it to the Sd card when the buffer is filled.

There are parts of my code that I'm not allowed to share, but the example code I gave is literally 100% of the function I'm trying to optimize.

As for storing it in a buffer, the high sampling rate makes it hugely impractical since accelerometer values aren't being logged while the program is writing the buffer to the card. Even if I just store the norm (and then I still have to perform a relatively expensive calculation), we're talking 2000 ints per second if I reach my optimal target sampling rate, 4 kB of data. My logger has 2 kB of RAM, so enough for 0.5 s of logging before it needs to take an equally long break in order to write it to the storage.

I'm looking to upgrade to another logger with more memory, but even then I will only be able to store about 8 seconds (or rather 3 seconds if I want to avoid spending 2-3 ms per cycle on calculating the squared norm). I really need to find a better way of storing the data in real time in order for this to be practical in any way.

So an ARM, like the Arduino Due, or the new released Zero should be a better fit.
You could also try to look at direct port manipulation to take the reads a bit faster, but I don't see much room for other improvements, maybe someone else will.

The analogRead() calls are currently just over 1 ms each, so probably not much room for improvement there.

Any remaining optimization basically boils down to whether I can "compress" (for lack of better words) my three values to a single one (and hence write it in a single write operation) in a less computationally expensive way than calculating the square of the norm. I suspect that bitwise concatenation is one possible avenue, but I'm not sure how to do it faster than sprintf.

thegreger:
The analogRead() calls are currently just over 1 ms each, so probably not much room for improvement there.

Who put the brakes on?

Sorry, I meant just above 0.1 ms (which is pretty much where the reference page says it should be). In the fastest versions of my code I get about 1 ms per cycle, which includes one sampling, one aggregation (such as calculating the square norm or calling sprintf) and one write operation. If I can find a quicker way to aggregate the data before saving it, I think it would be possible to reach maybe 0.5-0.7 ms.

thegreger:
Any remaining optimization basically boils down to whether I can "compress" (for lack of better words) ...

Write the data in binary format instead of as text.

Arank:
One good way of improving the sample rate is saving the data to a buffer and only write it to the Sd card when the buffer is filled.

Actually, the SD writes are already buffered. 512 bytes are accumulated in an internal buffer before an SD write actually occurs.

@thrgreger, as you suspected (and sterretje says), binary writes are faster than a text print (2 bytes each instead of 2-5 bytes each). This will also decrease the number of SD buffer writes. I'd start by writing the 6 binary bytes of analog readings:

  logfile.write( (uint8_t *) &accX, sizeof(accX) );

The next thing would be to use something faster than analogRead, by accessing the ADC registers directly and increasing the conversion clock rate. The clock rate used by analogRead limits you to ~10KHz (0.1ms/sample). I googled for "arduino analogRead time" and found this thread. Lots more...

SD cards have different performance times, too. 200KB/s is easily acheivable, so at 6 bytes/reading, that's a max logging rate of 33KHz. But that's only 30us per reading, which is ambitious.

What you may not know is that most SD cards will occasionally take 100ms to complete a write. So to get the highest speed, you will need to generate the readings in the background and write them to the SD in the foreground.

You will need to use an interrupt service routine (ISR) to catch a completed reading, save it, switch channels, and request the next reading (background processing). In loop, just check the queue (a ring buffer) of readings and write what's available (foreground processing).

// implement a ring buffer of uint16_t
// See HardwareSerial.cpp for the typical head and tail indices into an array (rx_buffer).
// Or implement your own ring buffer class with
//     available() and read() for loop
//     room() and write(val) for ISR

ISR(some ADC_vect name goes here...)
{
  if (readings.room() )
    readings.write( completedADCreading );

  switch channels (0, 1, 2, 0, 1, 2...)
  start conversion
}

void setup()
{
  set ADC clock rate
  open SD logfile
}

void loop()
{
  if (readings.available())
    logfile.write( readings.read(), sizeof(reading_t) ); // usually fast, but could take 100ms
}

I think your conversion rate is also reduced by switching channels. The final rate and the ring buffer size will determine how long an SD write can take before you start losing samples.

Cheers,
/dev

This thread might be relevant: Maximum speed that the Arduino can read an SD card - Storage - Arduino Forum

MarkT:
This thread might be relevant

Nice! @thegreger, see Reply #7.