Using raw SD card

A brief intro before the programming question:
The standart SD library uses more than half of Arduino's RAM. Only "reward" for this is possibility to use filesystem and create "real files". I don't think it is much useful. I would rather keep the RAM free, save raw data without any structure and when the time comes to retrive collected data I will read them and send to computer by Serial. It will be the computer's work to transform them into something eligible.

As far as I know there is a problem with SD card writing - you have to write whole 512 byte blocks, you cannot write single bytes as EEPROMs usually enable. So if you want to use the whole capacity of the card you need to have a buffer 512 bytes large. So I expected 600 bytes will be more then enough to implement the SD card of which I can use the 512 bytes buffer to store collected data before I send them to card at once.

I started to dissect the SD.h library and quickly found it is build over Sd2Card.h library. Just adding either of those two libraries to a blank sketch consumes nearly 800B of RAM despite there is no large buffer included in the Sd2Card. Morover it uses 5k program space without ever calling any function.

Next the Sd2Card #includes two header files. SdInfo.h and Sd2PinMap.h. Adding either of those (or both) to blank sketch yields the same result: 5k program space and 800 RAM consumed. I have tried comment most of the SdInfo.h including defines. Omiting the commented code SdInfo.h looks this way:

/** GO_IDLE_STATE - init card in spi mode if CS low */
uint8_t const CMD0 = 0X00;
/** SEND_IF_COND - verify SD Memory Card interface operating condition.*/
uint8_t const CMD8 = 0X08;
/** SEND_CSD - read the Card Specific Data (CSD register) */
uint8_t const CMD9 = 0X09;
/** SEND_CID - read the card identification information (CID register) */
uint8_t const CMD10 = 0X0A;
/** SEND_STATUS - read the card status register */
uint8_t const CMD13 = 0X0D;
/** READ_BLOCK - read a single data block from the card */
uint8_t const CMD17 = 0X11;
/** WRITE_BLOCK - write a single data block to the card */
uint8_t const CMD24 = 0X18;
/** WRITE_MULTIPLE_BLOCK - write blocks of data until a STOP_TRANSMISSION */
uint8_t const CMD25 = 0X19;
/** ERASE_WR_BLK_START - sets the address of the first block to be erased */
uint8_t const CMD32 = 0X20;
/** ERASE_WR_BLK_END - sets the address of the last block of the continuous
    range to be erased*/
uint8_t const CMD33 = 0X21;
/** ERASE - erase all previously selected blocks */
uint8_t const CMD38 = 0X26;
/** APP_CMD - escape for application specific command */
uint8_t const CMD55 = 0X37;
/** READ_OCR - read the OCR register of a card */
uint8_t const CMD58 = 0X3A;
/** SET_WR_BLK_ERASE_COUNT - Set the number of write blocks to be
     pre-erased before writing */
uint8_t const ACMD23 = 0X17;
/** SD_SEND_OP_COMD - Sends host capacity support information and
    activates the card's initialization process */
uint8_t const ACMD41 = 0X29;
//------------------------------------------------------------------------------
/** status for card in the ready state */
uint8_t const R1_READY_STATE = 0X00;
/** status for card in the idle state */
uint8_t const R1_IDLE_STATE = 0X01;
/** status bit for illegal command */
uint8_t const R1_ILLEGAL_COMMAND = 0X04;
/** start data token for read or write single block*/
uint8_t const DATA_START_BLOCK = 0XFE;
/** stop token for write multiple blocks*/
uint8_t const STOP_TRAN_TOKEN = 0XFD;
/** start data token for write multiple blocks*/
uint8_t const WRITE_MULTIPLE_TOKEN = 0XFC;
/** mask for data response tokens after a write block operation */
uint8_t const DATA_RES_MASK = 0X1F;
/** write data accepted token */
uint8_t const DATA_RES_ACCEPTED = 0X05;

This Arduino code

#include <SdInfo.h>

void setup() {
  // put your setup code here, to run once:

}

void loop() {
  // put your main code here, to run repeatedly:
}

compiles succesfully but still needs 5k program space and 800B RAM. Why this happens? Does Arduino include something because it thinks I may need it? Or is it some sort of bug? I tried to restart it repeatedly but still got the same result.

P.S. I just tried to delete all "advanced libraries" leaving only Sd2Card.h & .cpp, Sd2PinMap.h and SdInfo.h. Now it compiles using 498 bytes program space and 11 bytes RAM (48 program space and 2 RAM more than sketch without the include). Why?? If it is some kind of "clever feature" how can I disable linking unwanted libraries?

Thanks

I've also hit this problem and have determined it's a 'deficiency' in the compiler (or linker, maybe) in its ability to remove dead code - that is, code that is included, compiled but then never referenced.

There are some optimisations available where you can specify whether to compile for speed or size - I tried these ages ago but the results were inconclusive (even the -Os [optimize for size] sometimes gave me a larger compiled size).

Perhaps with the release of 1.6.8 it might be time revisit some of these options to discover whether it's become any better at dead code removal.

Aside: if you include half a dozen libraries in an otherwise dead program (ie no setup code, no loop code) it should be just a few bytes long - but it isn't. That's the result of sub-optimal dead code removal.

Sorry I can't provide a solution but others might be able to - and I'll retest those optimisations over the next few days too.

Try a new version of SdFat.

This program will write block zero of an SD card. Use SdFat-beta with no mods for best results

Sketch uses 1,922 bytes (5%) of program storage space. Maximum is 32,256 bytes.
Global variables use 24 bytes (1%) of dynamic memory, leaving 2,024 bytes for local variables. Maximum is 2,048 bytes.

#include <SdFat.h>
Sd2Card sd;
const uint8_t CS_PIN = 10;
void setup() {
  uint8_t buf[512];  // will be on stack

  // init card
  sd.begin(CS_PIN);

  // write block zero with junk on stack.
  sd.writeBlock(0, buf);
}

void loop() {
}

Edit: I added a call to sd.readBlock(address, buf) and it added about 300 bytes.

Sketch uses 2,208 bytes (6%) of program storage space. Maximum is 32,256 bytes.
Global variables use 24 bytes (1%) of dynamic memory, leaving 2,024 bytes for local variables. Maximum is 2,048 bytes.

I wonder why this post was moved from the programming section. I have encountered this while trying to use the SD card but I believe it is general programming problem. It is disturbing that including a simple library (as the reduced one in my original post) adds much of unvanted unused code not mentioned in the library. I would like to know why the Arduino is doing this. Is it somehow including also another files from the included directory? After all what is included to the plain

void setup (){}
void loop (){}

code?

When this was moved here I won't open a new topic but ask now:
All libraries I have seen doesn't use CRC and send a dummy CRC instead as default. Of three SD cards I have tried two check CRC (and return error) while one does not. Since I believe some check of data integrity is great I would like to use CRC everytime. Is it possible to force the SD card to check CRC?
Next question is about the CRC computing. I think both SdFat and Sd2Card use naïve way to compute CRC. They first compute CRC and then send the data with CRC. It looks quite lumpy for me. If I use hardware SPI at maximum speed I have free 16 CPU clock cycles while the byte is being send by the hardware. Fetching next byte of data from SRAM and managing the for loop should take just a few cycles leaving about 10 CPU clock cycles to update CRC with the byte currently being send. This computing CRC and ensuring data integrity would be "for free" because the libraries are waiting for SPI to complete sending the byte. Or is there any wrong assumption?

This computing CRC and ensuring data integrity would be "for free" because the libraries are waiting for SPI to complete sending the byte. Or is there any wrong assumption?

You won't gain much by mixing SPI and CRC calculation. The lookup tables are in flash and access is slow.

The base SdFat library runs on many boards and Operating systems. You can't mix CRC with DMA. I also allow Software SPI and mixing CRC would be a mess. I like to support the standard SPI library for a board where you can't mix CRC.

Block at a time raw I/O will never be fast. SD cards have very large internal blocks and emulate 512 byte blocks for backward comparability. This mean there will be large delays while data is moved and large internal blocks are written.

If you want performance, give up AVR and use a modern board with SDIO and hardware CRC.

STM32 boards with an SDIO SD socket are about $13 on ebay.

If you want super performance use a NUCLEO-F746ZG with 320KB SRAM, 1 MB flash. It costs $23 has built-in Ethernet and SDIO pins. I use it with the ChibiOS/RT RTOS. It can do 4-bit wide SDIO to an SD card at 50 MHz using DMA. This is a bus speed of 25 MB/sec and CRC is done on the fly. Uno/Mega SPI is 1 MB/sec but the best transfer rate is less due to no DMA.

The NUCLEO-F746ZG is a Cortex-M7 board with 64-bit wide internal buses, has pipeline architecture, and L1 caches. It can do a context switch in 0.2 µs.

Finally, giving up a file system has little speed advantage. Just use SD fat to allocate a large contiguous file and use raw IO. Then the file is easy to read on any device.

@fat16lib
I have found in Simplified SD Specifications that SDSC cards can be read and written in blocks 1 to 512 bytes large. Do you know how it works? I guess writing 1-byte block still reads a larger block, updates the single byte erases it whole and writes it back. Writing single bytes this way would ruin the card very quickly...

I don't understand this:

fat16lib:
SD cards have very large internal blocks and emulate 512 byte blocks for backward comparability. This mean there will be large delays while data is moved and large internal blocks are written.

Does that mean that physical blocks are usually larger than 512 bytes and each block write rewrites larger block of the SD (exhausting it's limited writes even faster)? Or the card can just write much larger section at once but doesn't have to?

fat16lib:
If you want performance, give up AVR and use a modern board with SDIO and hardware CRC.

Well you skip the CRC check for performance - I thought you see it as too slow to be worth it. As I think it is possible to implement CRC with minimal impact on performance in ATMega I wondered why noone else tried it. But I admit it would be more context specific so much more work would be needed to use in general library.

read and written in blocks 1 to 512 bytes large

It is only possible on some small older cards and is very slow. All version 2.0 cards require 512 byte read/write.

Does that mean that physical blocks are usually larger than 512 bytes and each block write rewrites larger block of the SD (exhausting it's limited writes even faster)? Or the card can just write much larger section at once but doesn't have to?

There are single and multiple block read and write commands. If you use the single block write command the card will likely do extra rewrites and data moves. This can result in very long write latency.

Don't worry too much about the "limited writes" for modern cards. Cards do wear leveling so a 16GB card is good for about 100 TB of writes. Even with rewrites it is unlike you will destroy it with a slow AVR Arduino.

Well you skip the CRC check for performance - I thought you see it as too slow to be worth it. As I think it is possible to implement CRC with minimal impact on performance in ATMega I wondered why noone else tried it. But I admit it would be more context specific so much more work would be needed to use in general library.

I have doubts about the gain of mixing CRC and SPI. Often two tight loops are as fast as mixing code in a single loop on AVR. I suggest you write some code and test it on ATMega instead of guessing.

Note that you will need CRC7 and 16-bit CRC-CCITT versions of SPI send and receive. And much of the SPI protocol is not even protected.

Most people use print to write to an SD and performance is not an issue. Formatting text reduces write speed to around 20-30 KB/sec.

I try to get people to enable CRC in critical apps but almost no one does. The reason is that either SPI is reliable or data errors get caught as SD command errors or invalid file system structures.

Here is the cost of CRC with a high quality SanDisk Extreme 32GB microSD.

This result is from the SdFat bench example.

No CRC:

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
312.46,31844,1416,1632

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
503.75,2144,988,1010

Smaller slow CRC-CCITT function:

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
125.50,78340,3684,4072

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
153.12,6632,3312,3337

Faster CRC with table lookup:

write speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
203.12,77236,2176,2514

read speed and latency
speed,max,min,avg
KB/Sec,usec,usec,usec
280.46,3756,1796,1819

The flash cost for the small CRC is about 300 bytes and about 800 bytes for the table lookup version.

Bottom line - SDIO is the reliable fast hardware answer. Use a modern CPU, not a 20 year old AVR.

It looks like something strange is happening here. From various sources I got impression a SD card is just a HUGE EEPROM with a chip for communication with the host and for writing to the EEPROM - as far as I know Flash is just a subgorup of EEPROM with no clear boundary. I think I understand how small EEPROMs (like AVR's on-chip or small I2C) work - it is just a field of cells lineary organized. If you want to write a byte you send it's address and new content and it's done. Most external EEPROMs have page organization - you may write more bytes at once saving time. Morover some have only pages larger than 1B - writing 1 byte requires to erase and rewrite a large area. I simply expected SD card works the same but minimal writable unit is large (from MCU's point of view). Isn't it?

You speak about wear leveling included. Does that mean endless writing in the first page will ruin the whole card? Any card or just some high tech ones?

I don't understand your benchmarks. Time to write to a SD is determined by two times: time to send data to the card and time to write them after all are accepted. AVR's maximal SPI speed is half of clock speed which is much less then maximum of any SD card so this time should be constant. I thought time to write a page is near constant too. How is it possible maximum write time is more than 20 times longer than minimum?

I don't want to bother you with noob questions. Do you have a source where I could get answers to such questions? I don't think I could get them in the official specifications - or should I read them more carefuly?

I have much time for reading but not so much for experimenting myself so I am reading and thinking. Hopefully I will get some time this weekend but still I want to proceed with caution. It would be sad trying how long it takes to ruin one byte of SD just to discover the whole card is destroyed due to wear leveling.

I got impression a SD card is just a HUGE EEPROM with a chip for communication with the host and for writing to the EEPROM

No, An SD card is a very complex device often with a 32-bit RISC processor. There are some number of RAM buffers.

There is a virtual to physical map of flash memory. Flash blocks are protected by powerful ECC codes and when a block starts to go bad a new block is mapped.

There are access counts so if a block contains static data and is not written, the data will be moved and the block will be erased and written with new data. All of this means a write operation is allowed to take up to 250 ms on SD/SDHC cards and as long as 500 ms on new SDXC cards.

Specified write performance of SD cards assumes huge contiguous writes of many MB. Cards are designed for devices like phones which can allocate many MB of buffer.

I can't help you read about all of this since each company uses proprietary controllers, flash chips, and algorithms.

I have access to some proprietary info but can't give access to others so Google and read.

fat16lib:
No, An SD card is a very complex device often with a 32-bit RISC processor. There are some number of RAM buffers.
...

And all of that for just 3 or 4 dollars? OMG. OK, thanks you very much for your time, no chance for understanding how SD works :frowning:

Just a few (hopefully) last questions:
I want to count how my ants are active and save 10-100 bytes of data every minute. If I find a SD card which enables partial writes is it wise to save such small amount of data (conserving RAM and minimizing impact of power loss)? Or will each write rewrite large part of the card leading to much faster exhausting of the card (the card will be old and possibly much used)?
If the card is so clever should I trust it? I planed to include some error correcting mechanisms but I guess they are already implemented in much clever way. Can I force the card to check for data integrity? Is of any use to read freshly written data if they were programmed correctly or the card does it on it's own?
Have the card some use of formatting such as help for wear leveling? If not I still plan to use the card as raw data storage because it looks easier for me.

f I find a SD card which enables partial writes is it wise to save such small amount of data (conserving RAM and minimizing impact of power loss)? Or will each write rewrite large part of the card leading to much faster exhausting of the card (the card will be old and possibly much used)?

You would be a fool to try this. I have several V 1.0 cards that were manufactured in 2004 and 2005. They are 16 MB cards. The default file system is FAT12. I don't know if these cards allow partial block reads/writes.

All well known brands from 2006 on are V 2.0 and don't allow partial block read/write.

Modern cards handle wear automatically and sleep at low power when idle.

Your wasting your time looking for an ancient little SD card.

If you want to save flash and RAM look at Petit FatFS.

I posted a version for Arduino on GitHub.

You can write and read without lots of buffer. Write works best if the SD is the only SPI device.

My example uses this much memory.

Sketch uses 4,872 bytes (15%) of program storage space. Maximum is 32,256 bytes.
Global variables use 134 bytes (6%) of dynamic memory, leaving 1,914 bytes for local variables. Maximum is 2,048 bytes.