Writing large amounts of data to CSV

Hello everyone,
I'm working on a project where I'm trying to log large amounts of data over long periods of time, and I'm having problems figuring out a way around the 1,048,576 row limit. Is there a way to go back in a csv file and write to a given column? After reaching the million+ row limit, I'd love to reset and start printing from column 'E' for example (and repeat this every time a specified row limit is reached). Is this possible? Or has anyone else found another way to solve the problem?

To give background (for those who want it), I'm building a device to measure the drift of mechanical clocks over long periods of time (30-40 days ideally). The code I have now measures the time difference between two subsequent ticks and logs it to an SD card with three columns (global time, time difference, measured sound level). I get a reading about every 250 milliseconds, so hitting that row limit only takes a few days.

I've attached a stripped-down version of what I'm running in case that helps

Any input you have is most welcome, thanks in advance!

ExampleCsvWrite.ino (6.68 KB)

ahubble:
I've attached a stripped-down version of what I'm running in case that helps

No, it doesn't, but if you post the code in the proper manner using </> tags, it might. Read the please read firsts at the head of the forum.

You might have a dog chasing car situation. You may find that some extra programming that filters out junk data, of which you might have quite a lot, would solve the storage problem.

ahubble:
and I'm having problems figuring out a way around the 1,048,576 row limit.

Is that a limit of whatever program reads it in the end? Or are you telling me the file is quickly growing up to 4 GB?

ahubble:
Is there a way to go back in a csv file and write to a given column?

Since textual data is usually variable in length, it's possible but not so easy (or even fast to execute).

ahubble:
After reaching the million+ row limit, I'd love to reset and start printing from column 'E' for example (and repeat this every time a specified row limit is reached). Is this possible?

Once again... yes but easier said than done.
Since everything is text and varies in length, overwritting a row (or part of it) can lead to "overlaps" or row merging (due to losing the new-line characters in the process).

If everything is encoded as binary (raw data), I would say yes without any doubt; since every "record" would always have a fixed length, making indexing pretty straghtfoward. Also this kind of encoding takes less space, is easier/faster to read/load and more effective in applying compression; being the downsides: human readability and difficulty to edit/import the data without customized (DIY) programs (because the commercial ones usually expect "standardized" textual formats like CSV, XML or JSON).

ahubble:
I get a reading about every 250 milliseconds, so hitting that row limit only takes a few days.

"Row limit" sounds more like "file size limit" or "storage limit" for me...

File size it's just as simple as creating another file, and storage limit... well, then get a larger card (up to 32 GB recommended, anything bigger might work but I cannot guarantee).

Thanks for your input,
First, I thought it would be easier to attach the code as a separate file, since it's not one specific point that I'm struggling with. I'll attach it with the code tags at the end.

Lucario448:
Since everything is text and varies in length, overwritting a row (or part of it) can lead to "overlaps" or row merging (due to losing the new-line characters in the process).

If I understand you correctly, once I stop writing on any particular given line, then the rest of that row is written as 'blank'? So in order to go back and start over in a new column, i'd really be editing the previous entries? I see why that could get me into trouble.

Lucario448:
File size it's just as simple as creating another file, and storage limit... well, then get a larger card (up to 32 GB recommended, anything bigger might work but I cannot guarantee).

I'm using a 32GB SD card to write the data to. A csv with three columns and 1,048,576 rows only takes up about 30MB - so that's not the problem. When you pull the csv off the SD card and open it, only the first 1,048,576 rows are displayed (and from what I can tell this is all I'll be able to get with the code I have now - hence wanting to go back and write to different columns).

I had hoped to keep everything in one file to make my life easier, but maybe it'll be better if I try to create a new file every time one fills up. It'll be more work for me post-recording, but if that works then it works. I just wanted to make sure there were no clever solutions floating around that I hadn't thought of.

Thanks again,

  // include the library code:
      #include <LiquidCrystal.h> // Needed for LCD Screen
      #include <SPI.h> // Needed for SD card writing
      #include <SD.h> // Needed for SD card writing

  // For Display
    //const int rs = 12, en = 11, d4 = 5, d5 = 4, d6 = 3, d7 = 2; // For Stock LCD Display
    const int rs = 8, en = 9, d4 = 4, d5 = 5, d6 = 6, d7 = 7; // For LCD Display Shield
    LiquidCrystal lcd(rs, en, d4, d5, d6, d7); // For All LCD Display

  // Variable Declaration
    const int GSS = A3; // Grove Sound Sensor
    int sensorValue = 0; // Declare Sensor Value for Groove sensor
    int THV = 775; // The threshold value to trigger if statement
    int i = 0; // Global Counter
    unsigned long currentTime = 0; // Timer
    unsigned long previousTime = 0; // Timer
    unsigned long Sdiff = 0; // Define short timer difference variable

  // Declare File for SD card
    File RawData;



//--- VOID SETUP ---
  void setup() {
    // put your setup code here, to run once:
      Serial.begin(9600); // Set bod rate

        // initialize SD card
          SD.begin(4); // Define SD card starting pin
          RawData = SD.open("RawData.csv", FILE_WRITE);
        
          // if the file opened okay, write to it:
          if (RawData) {
             lcd.clear(); // Clear LCD Display
             lcd.setCursor(0,0) ; //sets cursor to first line
             lcd.print("SD Card"); //Print SD Card Message
             lcd.setCursor(0,1) ; //sets cursor to second line
             lcd.print("Detected"); //Print SD Card Message
            
            // Initialize file headings
            RawData.println("Movement Counter, version: 1.05-Beta");
            RawData.close(); // close the file:
            RawData = SD.open("RawData.csv", FILE_WRITE);
            RawData.println("Global Time (ms since start), Time since last reading (ms), Sound threshold (unitless)");
            RawData.close(); // close the file:
            
            } 
            else {
            // if the file didn't open, print an error:
             lcd.clear(); // Clear LCD Display
             lcd.setCursor(0,0) ; //sets cursor to first line
             lcd.print("SD Card Error"); // Print SD Card Message
            }

  } // END VOID SETUP


//--- VOID LOOP ---
  
  void loop() {
    // Module to control microphone sensor reading
    
    sensorValue = analogRead(GSS); //use A3 to read the electrical signal
          
      if(sensorValue > THV) {
            
          i++; // incriment counter - serves no real purpose other than to confirm on the screen that the program is updating
      
          currentTime = millis(); // Log the current time
          Sdiff = currentTime - previousTime; // Calculate time between readings

          // Display
          lcd.clear(); // Clear LCD Display
           
          lcd.setCursor(0,0) ; //sets cursor to first line
          lcd.print("S:"); //  Print Microphone output label
          lcd.setCursor(3,0) ; //sets cursor to first line, 4th column
          lcd.print(sensorValue); //  Print Microphone output
  
          lcd.setCursor(9,0) ; //sets cursor to first line, 10th column
          lcd.print("V:"); //  Print THV label
          lcd.setCursor(12,0); //sets cursor to first line, 4th column
          lcd.print(THV); //  Print difference in time values
  
          lcd.setCursor(0,1) ; //sets cursor to second line
          lcd.print("t:"); //  Print miliseconds label
          lcd.setCursor(3,1); //sets cursor to second line, 4th column
          lcd.print(Sdiff); //  Print difference in time values

          lcd.setCursor(9,1) ; //sets cursor to second line, 10th column
          lcd.print("C:"); //  Print counter label
          lcd.setCursor(12,1); //sets cursor to second line, 13th column
          lcd.print(i); //  Print counter label

          // Write to SD Card
          RawData = SD.open("RawData.csv", FILE_WRITE);
          RawData.print(currentTime); // Write: Global time, time difference, sound value
          RawData.print(",");
          RawData.print(Sdiff); // Write: Global time, time difference, sound value
          RawData.print(",");
          RawData.println(sensorValue); // Write: Global time, time difference, sound value
          RawData.close(); // close the file:

        previousTime = currentTime; // Log the current time as the previous time

        // Time Management 
        delay(80); // Force minimum time between readings to ensure distinct signals (in miliseconds)

        } // END IF STATEMENT 
    
      // Reset Counters to prevent display overrun 
          if(i >= 9999) {i = 0; lcd.setCursor(12,0); lcd.print("    "); } // Reset counter after sufficient iterations

      // Program termination if onboard clock reaches 49 days (4233600000 ms) 
          if(currentTime >= 4233600000) {
                lcd.clear(); // Clear LCD Display
                lcd.setCursor(0,0) ; //sets cursor to first line
                lcd.print("Max Time"); //  Print Microphone output
                lcd.setCursor(0,1) ; //sets cursor to first line
                lcd.print("Reached"); //  Print Microphone output
                exit(0); // Terminate Program
                };

  } // END SCRIPT

ahubble:
If I understand you correctly, once I stop writing on any particular given line, then the rest of that row is written as 'blank'?

Wrong.

OK imagine this: let's say you located the row you want to replace (overwrite); if the new one is shorter than the old one, some stray characters from the old row will remain before the new line mark; if the new one is longer than the old one, you lose this "new-line mark" between rows (merging those indeed) and potentially overwritting part of the next one.

Simply put: editing a text file in an Arduino is like editing a document... without the backspace key (or deleting characters at all) and stuck in overwrite mode (rather than the usual vertical cursor of the insert mode). To accomplish text editor capabilities, the system would either had to have a lot of RAM, or create a temporary file with a copy of the original but with all the changes.

This inflexible behavior is not even fault of the Arduino or the library, it is an inherent nature of digital storage (you can see it as a gigantic array of bytes, and as such you should know the headache involed in deleting/inserting data when is not at the last position).
This means that even a computer does re-write the whole file (or most of it) in order to save any change that adds or removes information; it just happens that a computer (PC) is way faster and has a lot of RAM that this process appears seemless.

ahubble:
So in order to go back and start over in a new column, i'd really be editing the previous entries?

Again, only if the new part has a different length from the old one.

ahubble:
When you pull the csv off the SD card and open it, only the first 1,048,576 rows are displayed

Define "open it", because I don't think it's a library problem since once I managed to play back a wav file of that size without issues.
What I didn't tested, is writting large files. The largest I ever write on an Arduino is around 10 MB, but somebody made a 2-hour audio recording (8-bit mono sampled at 16 KHz) that resulted in a 110 MB file, still no problem.

ahubble:
I had hoped to keep everything in one file to I just wanted to make sure there were no clever solutions floating around that I hadn't thought of.

Yeah, but I'm baffled for the fact you're having trouble beyond 30 MB; when an audio recording that worths (in size) almost four times that, has no problem at all. :confused: ::slight_smile:

I believe the 1,048,576 row limit is in spreadsheet applications like Microsoft Excel or LibreOffice Calc:

Excel specifications and limits

Frequently asked questions - Calc

It's not a limitation of the SD library or with the file itself. Those spreadsheet applications will just not display any data after the first 220 rows. Other applications that can process CSV files may or may not have the same limitation.

christop:
It's not a limitation of the SD library or with the file itself. Those spreadsheet applications will just not display any data after the first 220 rows. Other applications that can process CSV files may or may not have the same limitation.

If that's the case, then it makes A LOT of sense now.

I have never imported data of the size you mention but seem to remember if the number of rows in an Excel worksheet is exceeded by the data from a csv file Excel will prompt for a column address to fill in with the remainder. If you have tried this already and did not receive a prompt perhaps it may be worthwhile doing a little research to see if my memory is right or wrong.