SD-card writes ESP32 slow down after ?

Ok some background. I have an ESP32 with an Ethernet shield W5500 on 1 bus (VSPI) and an SD-card on HSPI bus.
Now UDP packages are coming in on the ethernet side of things, and these are converted to show led pixels. Basically ArtNet protocol. and works just fine. Smooth as, even for up to 16 universes using 4 ports. Recording the pixelstream has worked flawlessly on smaller amount of pixels (up to 300 pixels) and of course more pixels might create issues.
Not only is there more processing for the sd-card, but overal processing has also multiplied, so i sort of was expecting some issues, but initially everything seemed to work like a breeze. I recorded 2 small pixelstreams yesterday evening, and the recording process was not affecting the incoming data at all, and with playback i did have some small glitch issues until i moved the events to run on core '0'.
Me really happy of course, and so i did some more testing this morning, and during the first test as i was changing the color of the pattern in Resolume (a VJ program) all of a sudden there was some hickups / stutters in the incoming datastream processing. And it must be caused by the writing to the SD card, since when i stop recording these stutters stop again.

Now i did some research & experiments :
I upped the SPI speed to 80Mhz (first 60Mhz but also 80Mhz seems to work with the SD-card and the W5500 does support it) That did help some what, but nothing conclusive. I reduced the frame-size to about half (1360 pixels).
I made sure the CS pin stays LOW and the SD-card stays enabled (commenting out the digitalWrite(CS_PIN, HIGH) in the deselect function of the SD library)
I tried a couple of other core versions for the esp32. I have been using 2.0.11, and due to library compatibility issues i am restricted to using core 2.x.x at the moment.
Now i am using a class 4 SD card (i think i am no sd specialist) it's 4Gb and i read somewhere that a class 10 card should be faster. (i ordered one !) Also i read that the FAT filesystem may be causing my issues. The changing of the file size as i am opening, appending & closing again and the spread over different sectors and so on, which causes the updating of the file registry etc.

As you can tell i haven't gotten to deep into it just yet, but if there are any suggestions on the issue or a direction i should pursue that could be really helpful.

As said, it seemed to work, so all i have done so far is just a bit of trial and error to see if that would help. I don't have a test sketch yet to illustrate the problem. It is anyway fairly complex with 3 tasks. UDP reception, WS2812 transmission (using Makuna Neopixelbus x8 mode, so after the encoding it's I2S fully in the background) and writing the pixel buffer onto an SD card. Mainly i am looking for a way to keep the writing to the SD-card at the same speed as it was last night during the first few tests :smiley:

Ok so i made a small benchmark test (which i will expand upon later i guess)

#include "FS.h"
#include "SD.h"

#define HSPI_SCK 17
#define HSPI_MISO 16
#define HSPI_MOSI 4
#define SD_CS 15  // Added 10K pullup for v2 (strapping pin must be HIGH)

#define LED_WHITE 14

#define SPI_FRQ 32000000

#define NO_ACTION_SD 0
#define CLOSE_SD 1
#define OPEN_SD 2

#define SERIAL_OUTPUT_BAUD 500000
#define PIXEL_BYTES (680 * 3)
#define PORTS 4

uint8_t stripixels[PORTS][PIXEL_BYTES];
uint16_t lengths[PORTS] = {640, 640, 640, 480};
uint16_t totalLengths = 0;

SPIClass * hspi = NULL;


void setup() {
  pinMode(LED_WHITE, OUTPUT);
  int pinState = LOW;

  Serial.begin(SERIAL_OUTPUT_BAUD);
  Serial.println();
  Serial.println();
  Serial.println(F("SD Benchmarker."));
  delay(1000);
  
  uint8_t b = 0;
  for (uint8_t i = 0; i < PORTS; i++) {
    for (uint16_t j = 0; j < PIXEL_BYTES; j++) {
      stripixels[i][j] = b;
      b++;
    }
    totalLengths += lengths[i];
  }
  Serial.println(F("Arrays Filled."));
  Serial.print(F("Total Lengths : "));
  Serial.println(totalLengths, DEC);
  delay(1000);
  
  if (InitSdCard()) {
    uint32_t startTime = millis();
    if (appendSdFile(OPEN_SD)) {
      uint16_t frames = 40 * 60;
      bool sdOk = true;
      while ((frames) && (sdOk)) {
        sdOk = appendSdFile(NO_ACTION_SD);
        frames--;
        digitalWrite(LED_WHITE, pinState);
        pinState = HIGH - pinState;
      }
      if (sdOk) {
        appendSdFile(CLOSE_SD);
        uint32_t totalMs = millis() - startTime;
        Serial.print(totalMs, DEC);
        Serial.println(F(" milliseconds have passed"));
      }
    }    
  }
  else {
    Serial.println(F("SD Init Failed."));
  }
}

void loop() {
  // put your main code here, to run repeatedly:

}

bool InitSdCard() {
  if (hspi == NULL) {
    hspi = new SPIClass(HSPI);
  }
  hspi->begin(HSPI_SCK, HSPI_MISO, HSPI_MOSI, SD_CS);
  hspi->setFrequency(SPI_FRQ);

  if (!SD.begin(SD_CS, *hspi, SPI_FRQ)) {
    Serial.println(F("No SD card found"));
    return false;
  }
  return true;
}

bool appendSdFile(uint8_t stat) {
  char data[] = "DmagiX PixelStream vx.xx\rColorSequence=X\rFrameSize=0000\rDmxSize=000\rID=000000#";
  static File myFile;
  static uint32_t firstFrameTime = 0;
  if ((!myFile) && (stat == OPEN_SD)) {
    myFile = SD.open(F("/frametest5.dmf"), FILE_WRITE);
    if (!myFile) {
      Serial.println(F("Opening file failed."));
      return false;
    }    
    uint16_t bytesWritten = myFile.write((uint8_t *) data, (sizeof(data) - 1));
    if (bytesWritten) {
      firstFrameTime = millis();
      return true;
    }
    myFile.close();
    Serial.println(F("TC write failed."));
    return false;
  }
  if (!myFile) {
    Serial.println(F("File is not open."));
    return false;
  }
  if (stat == CLOSE_SD) {
    myFile.close();
    Serial.println(F("File closed."));
    return true;
  }
  uint16_t bytesWritten = 0;
  
  uint32_t moment = millis() - firstFrameTime;  // write the timecode
  uint8_t nr_bytes = 4;
  while (nr_bytes) {
    nr_bytes--;
    uint8_t m = (moment >> (8 * nr_bytes)) & 0xFF;
    if (myFile.write(m)) bytesWritten++;
  }
  
  for (uint8_t i = 0; i < 4; i++) {
    uint8_t *p = stripixels[i];
    bytesWritten += myFile.write(p, lengths[i]);
  }
  if (bytesWritten == totalLengths + 4) {
    return true;
  }
  else {
    Serial.println(F("Incorrect Frame size written."));
    myFile.close();
    return false;
  }
  return false;  // not possible
}

Basically i am writing frames in a similar way as i was doing in my main sketch, though i decided to see if i could just leave the file open (in my main sketch i close it every time, but that does seem like a wasteful action) and see how much time it takes to write what realistically is 1 minuten of frames.
Now the first time it took about 4 seconds, and that was consistent, even less when writing to the file with the same name, but then i started writing to more files, and it increased significantly to first 15 seconds and then about 18 seconds.

Arrays Filled.
Total Lengths : 2400
File closed.
4070 milliseconds have passed
Arrays Filled.
Total Lengths : 2400
File closed.
3898 milliseconds have passed
Arrays Filled.
Total Lengths : 2400
File closed.
3895 milliseconds have passed
Arrays Filled.
Total Lengths : 2400
File closed.
15769 milliseconds have passed
Arrays Filled.
Total Lengths : 2400
File closed.
18049 milliseconds have passed
Arrays Filled.
Total Lengths : 2400
File closed.
18172 milliseconds have passed
Arrays Filled.
Total Lengths : 2400
File closed.
18427 milliseconds have passed

Now i have heard that it is most efficient to write in blocks that are the same size as sectors ? anybody have any idea ?

So i upped the total files size a bit, and also found that writing to a file that already exists greatly improves the writing time, but the issue is that some of the frames take an excessive amount of time to write.
These are the result for 40 * 1200 frames (10 minutes of data in about 90 seconds) and any frame that exceeds 20ms
A 200ms frame write is going to be rather visual.

Arrays Filled.
Total Lengths : 2400
File closed.
90603 milliseconds have passed
Frame : 47761 exceeds 49 ms
Frame : 47759 exceeds 183 ms
Frame : 45164 exceeds 162 ms
Frame : 43866 exceeds 188 ms
Frame : 43864 exceeds 179 ms
Frame : 40397 exceeds 54 ms
Frame : 40395 exceeds 210 ms
Frame : 40394 exceeds 177 ms
Frame : 38663 exceeds 152 ms
Frame : 38661 exceeds 181 ms
Frame : 36928 exceeds 188 ms
Frame : 36926 exceeds 180 ms
Frame : 35194 exceeds 70 ms
Frame : 35192 exceeds 208 ms
Frame : 35190 exceeds 175 ms
Frame : 33459 exceeds 141 ms
Frame : 33457 exceeds 182 ms
Frame : 31725 exceeds 192 ms
Frame : 31723 exceeds 185 ms
Frame : 29987 exceeds 86 ms
Frame : 29985 exceeds 208 ms
Frame : 29983 exceeds 174 ms
Frame : 28252 exceeds 126 ms
Frame : 28250 exceeds 178 ms
Frame : 27816 exceeds 190 ms
Frame : 27814 exceeds 181 ms
Frame : 27380 exceeds 94 ms
Frame : 27378 exceeds 208 ms
Frame : 27376 exceeds 175 ms
Frame : 24783 exceeds 123 ms
Frame : 24781 exceeds 186 ms
Frame : 23049 exceeds 193 ms
Frame : 23047 exceeds 187 ms
Frame : 21314 exceeds 105 ms
Frame : 21312 exceeds 209 ms
Frame : 21311 exceeds 176 ms
Frame : 20006 exceeds 110 ms
Frame : 20004 exceeds 180 ms
Frame : 17845 exceeds 189 ms
Frame : 17843 exceeds 182 ms
Frame : 16111 exceeds 116 ms
Frame : 16109 exceeds 210 ms
Frame : 16107 exceeds 177 ms
Frame : 12642 exceeds 57 ms
Frame : 12640 exceeds 97 ms
Frame : 12638 exceeds 180 ms
Frame : 10907 exceeds 190 ms
Frame : 10906 exceeds 181 ms
Frame : 9173 exceeds 133 ms
Frame : 9171 exceeds 207 ms
Frame : 9169 exceeds 173 ms
Frame : 7438 exceeds 89 ms
Frame : 7437 exceeds 179 ms
Frame : 6566 exceeds 188 ms
Frame : 6564 exceeds 184 ms
Frame : 6130 exceeds 146 ms
Frame : 6128 exceeds 210 ms
Frame : 6126 exceeds 177 ms
Frame : 5704 exceeds 81 ms
Frame : 5702 exceeds 182 ms
Frame : 5268 exceeds 191 ms
Frame : 5266 exceeds 183 ms
Frame : 4831 exceeds 146 ms
Frame : 3969 exceeds 146 ms
Frame : 3968 exceeds 208 ms
Frame : 3966 exceeds 174 ms
Frame : 3097 exceeds 81 ms
Frame : 3095 exceeds 179 ms
Frame : 2661 exceeds 194 ms
Frame : 2659 exceeds 186 ms
Frame : 2231 exceeds 211 ms
Frame : 2228 exceeds 149 ms
Frame : 2226 exceeds 210 ms
Frame : 2225 exceeds 176 ms
Frame : 1795 exceeds 75 ms
Frame : 1793 exceeds 185 ms
Frame : 923 exceeds 193 ms
Frame : 921 exceeds 183 ms
Frame : 497 exceeds 137 ms
Frame : 61 exceeds 22 ms
Frame : 59 exceeds 209 ms
Frame : 57 exceeds 175 ms

Did a small test with the newest core this morning. If it would fix it, i would consider changing my main sketch to be able to compile on that, but alas it doesn't. Speed is marginally improved, but the issue remains that a framewrite takes near 200ms (2x) at which appears to be the the change to the next cluster. Now this does make sense, since that requires the opening and updating of the FAT, which probably is in a different cluster again etc. so what i could i do to get around this ?

  • Bigger sectors,, but the sectors would have to be 'HUGE' for it to not make a difference
  • Always write to consecutive sectors initially, (sort of a raw writing) and then consolidate the file after the recording is complete?

This would involve a rather complex way of doing things.

Now i realize that the whole error checking may also be a consideration. If there is CRC check done, does that happen when a cluster is full. Is any cluster checked before writing to it ? In that case smaller clusters could help. The average writing time is well within spec, just the occasional super slow writes cause an issue.
I was thinking to use the 2nd core and pin a task to this (never done it, but it does not look all that complex) But that may involve temporarily store the frames, and i am pretty sure that i am going to run out of memory space when i need to do that. If frames are coming in every 20ms, that means i would need a 10 frame buffer at least, and that i already don't have.
I am going to end up delving a lot deeper into this than i intended.