Unformatted write to SD

I was planning on writing a trimmed down SD lib that just writes to the device without using a file system.

I would probably limit it to one SD class and read would not be needed.

The main aim is to get the RAM usage in the lib down to a minimum to save it for data.

The sketch would be a fast dedicated ADC sample and logger.

There would be less software overhead on the write as well although the difference would not probably matter w.r.t. the SD card write speed limitations.

Less code as well but this is not critical.

Has anyone seen this done already?

PS since someone's going to ask anyway, to recover data on linux:

dd if=/dev/sdd1 of=/tmp/adc.data bs=512

You must write an entire block at a time. You must send all 512 bytes of data and two dummy or CRC bytes in one SPI transfer.

You can't implement a fast logger this way since the SD may present a busy delay of up to 100 ms when you send a single block write command.

Only streaming block write commands are fast. See AnalogIsrLogger20120810.zip Google Code Archive - Long-term storage for Google Code Project Hosting.. It can do 100,000 samples per second.

You can't log faster with an external ADC since you can't used the SPI bus with the ADC and SD. The SD requires the block write to be done as a single SPI transfer.

I have a very fast Software SPI routine but it only runs at 2MHz so about 40,000 samples per second is the limit for the sketch I posted at the above site for fast logging with an external ADC.

Writing to the SD is not as much the problem as acquiring the data.

You must use multi-block streaming write commands. I have posted many sketches and there is a RawWrite example in my latest version of SdFat.

Data is worthless for most measurements unless it is taken at precise time intervals.

The quality of a logger is often determined by logging a pure sine wave and then doing a FFT. You won't get good results if there is time jitter or too much noise.

Effective number of bits (ENOB) is derived from an FFT analysis of the ADC output when the ADC is stimulated with a full-scale sine-wave input signal. The root-sum-of-squares (RSS) value of all noise and distortion terms is computed, and the ratio of the signal to the noise-and-distortion is defined as SINAD, or S/(N+D).

http://www.analog.com/static/imported-files/tutorials/MT-003.pdf

I suggest more thought on acquisition.

thanks for you comments. I did not realise all you had provided initially. I'm working through it now, but don't have any SD hardware to test on yet.

fat16lib:

You can't implement a fast logger this way since the SD may present a busy delay of up to 100 ms when you send a single block write command.

Is that from the spec or is it an experimental finding?

I'd don't see why the SD would cause such a delay if all the required blocks have been pre-erased.

Is there some other internal firmware functionalty that could be coming into play?

It's from the spec and experiment. I was shocked years ago when I wrote my first SD library.

Actually if the maximum write latency is only 100 ms you are lucky. Here is the ugly spec:

4.6.2.2 Write
For a Standard Capacity SD Memory Card, the times after which a timeout condition for write operations occurs are (card independent) either 100 times longer than the typical program times for these operations given below or 250 ms (the lower of the two). The R2W_FACTOR field in the CSD is used to calculate the typical block program time obtained by multiplying the read access time by this factor. It applies to all write commands (e.g. SET(CLR)_WRITE_PROTECT, PROGRAM_CSD and the block write commands).

High Capacity SD Memory Card and Extended Capacity SD Memory Card indicate R2W_FACTOR as a fixed value. In case of High Capacity SD Memory Card, maximum length of busy is defined as 250ms for all write operations.

So even if the card has been erased, the spec allowed a card to have long busy periods occasionally while programming flash.

I have very little information what the cards controller is doing during this time. Most of the details about the card controller are a manufacture's trade secret. I suspect it is due to a wear-leveling operation. the card won't program the block even though it has been erased. wear-leveling happens for very large areas like 128 KB. This requires a huge copy. In streaming mode you tell the card what blocks will be written so the controller can plan ahead.

I have experimented with about 40 cards and find that many standard (2GB or less) cards maintain low write latency if you use multi-block streaming mode and space writes at even intervals.

Many better SanDisk cards perform well in this mode. Here is a benchmark at 500 blocks per second for a 2GB Extreme III card:

Start raw write of 5120000 bytes at
256000 bytes per second
Please wait 20 seconds
Done
Elapsed time: 20000 millis
Max write time: 828 micros
Overruns: 0

The max time for a 512 byte write was 828 usec.

Block mode write and SPI are available for compatibility with the spec but may have very poor performance.

SD card are designed to be cheap and assume that devices like video cameras have a large amount of buffering so occasional long write latency is OK. Class 10 card performance is based on the average rate for writing many MB of data.

The new SDXC cards can be busy for up to 500 ms.

probably the hardest hit block will be block 0 with the FAT and partition table.

The overwhelming assumption will be that these devices are used with the installed FAT fs. The firmware will almost certainly need to map this block to different locations from time to time based on unknown and different design criteria.

Each time a FAT based file is written the FAT will get updated. In the hope that the firmware is actually counting FAT updates (probalby buffering them as well) I would guess that avoiding using block 0 and (by implication) not using a FAT based fs , it may be possible to avoid these busy delays.

I'm still waiting for my Teensy and its SD so I'll just ask what you think at this stage.

Have you tried writing to SD in raw mode , avoiding any fs and skipping block 0 ?

Thanks for the tip about staying with the the 2GB limit.

Writing with no FS will not help. Avoiding some block will not help.

You can't solve the problem by speculation. Remember, "One fact is worth more than a 1000 speculations".

The SD standard has lots of alignment requirements for how to format the SD so that performance will be optimal. If you use the SD Association's formatter or my formatter, the SdFat SdFormatter.ino example, file structures will be properly aligned with erase groups.

You can't guess what the best policy will be. I have spent days trying and every card is different and the behavior varies with card use.

Only two things seem matter and the big thing is the SD card controller. If you have a good controller, you must use multi-block write and selecting write with pre-erase seems to help.

I added the ability to quickly create a large contiguous file to SdFat. Doing raw writes to these files is just as good as having no FS and access is easier on other computers. You have more flexibility than using something like dd. Multiples regions on one SD become a pain with dd.

Unfortunately cards with really good performance are no longer being manufactured. Cards that look the same have different controllers.

My best card was manufactured in 2007 and is a 2GB SanDisk Extreme III. This is version 8.0 of this model card.

The standard SD card (cards with 2GB or less) are being phased out.

Some SDHC cards perform fairly well with Arduino but again cards of the same model vary depending on the card version.

I have had good luck with some 4GB SanDisk Extreme cards.

Here are two examples, a 2GB card and a 4GB card. Notice that block groups on the 2GB card are much smaller than the 4GB card. 32 blocks vs 128 blocks. Also alignment of the FS partition on the 4GB card has a big unused space before the partition. The 2GB card has a smaller space before the FAT partition.

2GB Extreme III card:

Manufacturer ID: 0X3
OEM ID: SD
Product: SD02G
Version: 8.0
Serial number: 395023392
Manufacturing date: 11/2007

cardSize: 3970048 (512 byte blocks)
flashEraseSize: 32 blocks
eraseSingleBlock: true

SD Partition Table
part,boot,type,start,length
1,0X0,0X6,249,3969799

4GB Extreme HD Video card:

Manufacturer ID: 0X3
OEM ID: SD
Product: SD04G
Version: 8.0
Serial number: 3027274498
Manufacturing date: 4/2011

cardSize: 7744512 (512 byte blocks)
flashEraseSize: 128 blocks
eraseSingleBlock: true

SD Partition Table
part,boot,type,start,length
1,0X0,0XB,8192,7736320

You can't solve the problem by speculation. Remember, "One fact is worth more than a 1000 speculations".

I totally agree but when dealing with black-box situations like undocumented and commercially secret firmware sometimes that's the only option left. Though I would categorise it as hypothesising and hypothesis testing rather than speculation.

I thank you for your comments. You have clearly done a lot of research on this problem. I'm just trying to find an angle that you have not thought of yet that may help.

It's not easy to double-guess what the various firmware designers have chosen to do and , as I commented above , this will vary. Your results confirm that.

It seems that the potential 100ms busy time is a bit of a killer for what we are both trying to do. I'm trying to guess what could be happening during that time and whether it is possible to avoid triggering it. The SD standard presumably allows this delay to allow for internal housekeeping/load spreading/encryption etc.

This will almost certainly be hardware specific but understanding the problem will surely help avoid it.

You can't guess what the best policy will be. I have spent days trying and every card is different and the behavior varies with card use

Which corroborates my suggestion that there is some load spreading algo intervening here.

Am I right in thinking that the "groups" probably relate to individual physical flash chips within the device?

Notice that block groups on the 2GB card are much smaller than the 4GB card. 32 blocks vs 128 blocks.

Does a contiguous write that starts and ends in the same (pre-erased) group perform any better than one that runs into two groups? Maybe there is internal buffering and the physical write op only happens at group level, not at the now (physically) theoretical 512k SD block size. There may be a firmware equivalent of sync() that flushes after a certain period or period of inactivity on the SPI.

Avoiding some block will not help.

I only suggested avoiding block 0 since it is very likely to get special treatment on a device designed to run with a FAT based fs. From what you describe, I would now suggest avoiding the entire group containing that block. This may coincide with what you say about correct file alignment on format. (I presume you are saying that the first file should start at the next group boundary , rather than in the block following the FAT).

Thanks for sharing the fruits of your investigations. This is crucial stuff for getting good logging performance.

It seems that the potential 100ms busy time is a bit of a killer for what we are both trying to do.

No, I am perfectly happy with the result I posted above. Here it is again:

Start raw write of 5120000 bytes at
256000 bytes per second
Please wait 20 seconds
Done
Elapsed time: 20000 millis
Max write time: 828 micros
Overruns: 0

This means I can write at up to 256 KB/sec and the time to write a block is no greater than 828 usec. There is no busy delay so about 42% of the CPU time is required. This program simulates a data logger by writing a block every 2,000 usec in the multi-block mode I described before.

It is very difficult to write 512 bytes from RAM to the SPI bus much faster.

For typical logging applications, most of the CPU time is used acquiring data. Writing at 100 KB/sec requires less than 20% of the CPU so 80% is available to acquire data.

The above result is for raw writes to a large contiguous file. The fact the SD is formatted with a file system has no effect.

If you use the same SD for logging with single block writes to a file you get this result for 100 byte writes:

Type is FAT16
File size 5MB
Buffer size 100 bytes
Starting write test. Please wait up to a minute
Write 199.21 KB/sec
Maximum latency: 86384 usec, Minimum Latency: 84 usec, Avg Latency: 496 usec

There was at least on busy delay of 86.4 ms with this SD. The minimum latency, 84 usec, occurs when the write is just a copy to the SdFat block buffer and no write to the SD occurs.

This is one of the best SD cards around for Arduino use and it has almost a 100 ms delay.

The rate was under 200 KB/sec and required 100% CPU.

So what do you expect to achieve?

Why do you want to use single block writes. The above test proves that streaming multi-block writes work with good cards.

Wear leveling algorithms are not always a totally black box. Look at this []http://www.stec-inc.com/downloads/AN-0702_STEC_SMALL_CARDS_WEAR_LEVELING_LIFETIME_CALCULATOR.pdf]](http://www.stec-inc.com/downloads/AN-0702_STEC_SMALL_CARDS_WEAR_LEVELING_LIFETIME_CALCULATOR.pdf).

Tell me how it helps if you can't access the use counts and mappings.

Since every manufacturer has different internal structures and algorithms it is even harder.

I don't think you're quite following me. Part of my aim was to remove the RAM needed by the full SDfat lib that I really don't need. I'm not saying that there is no call for what you have done. I think it's excellent work and what most people on Ard probably want.

I'm looking at trimming the fat to keep the most RAM for data. I have 8 channels of 16b data. (This will probably run on Teensy to get the 8 channels). This is dedicated hardware, I have no reason not to dedicate an SD card for it and thus dumping a continuous stream unformatted is no problem. I can handle the rest on Linux later.

I'm also likely to be creating >5MB of data over time , this is a data logger. Hence my interest in how groups relate to what STEC are calling management blocks. If I'm interpreting this correctly, attempting to define a group > 128MB will probably get refused. Also to get the largest contiguous block you will need to know what else is on the fs (ie have it freshly formatted) at which point some of the interest in using an fs is lost.

I think to get the best from any given card will require some specific information about it and adapting the writing cycle to fit. It looks like you libs and examples will provide a lot of useful info off the card.

I agree that your multiblock write within a fs structured file is probably no different than a raw write of the same size. Any fs related stuff will happen before or after.

You say writing at 100kB/s uses <20% but what is the smallest continuous stream you can output? One block of 512b in just under 1ms. How does that relate to your jitter?

Doesn't that limit you to 1kS/s unless you are willing to accept some substantial jitter?

Part of my aim was to remove the RAM needed by the full SDfat lib

The 512 byte block cache RAM in SdFat can be used for logging with raw writes. I use it in my fast loggers. There is a call that flushes the block cache and returns the address of the cache. Very little other RAM is globally allocated.

Also to get the largest contiguous block you will need to know what else is on the fs (ie have it freshly formatted) at which point some of the interest in using an fs is lost.

It's easier to use contiguous files than a raw device. That's why the POSIX real-time file extensions were developed for RTOSs used in embedded systems.

SdFat allows up to a 4GB contiguous file to be created. It finds the first fit place. If you are willing to use an SD as a raw device, you will suffer more pain than formatting the SD.

I think to get the best from any given card will require some specific information about it and adapting the writing cycle to fit.

Not likely. Better to spend some money on an industrial SD designed for embedded systems.

Doesn't that limit you to 1kS/s unless you are willing to accept some substantial jitter?

The jitter for the 100,000 sample per second logger is a less than one CPU cycle, which is 62.5 ns.

I trigger the ADC on a timer1 compare event. I read the completed conversion in an ISR and buffer it.

The buffers are written to SD in the background. At least 82 data points are taken during the write of an SD block.

I ran the following sketch to check memory use.

#include <SdFat.h>
#include <SdFatUtil.h>
SdFat sd;
SdFile file;

void setup() {
  if (!sd.begin()) return;
  file.open("SIZE_TST.TXT", O_RDWR | O_CREAT | O_AT_END);
  file.println(FreeRam());
  file.close();
}
void loop() {}

The file contains the value 1369. So total used RAM is 679 bytes. Since the 512 byte buffer can be used for logging, total RAM for the Arduino core and other SdFat use is 167 bytes.

and other SdFat use is 167 bytes.

Thanks, from eyeballing the code I thought it would be more than that.

SdFat allows up to a 4GB contiguous file to be created.

Yes, but have you tested >5MB to see whether you are still getting no busy time? Specifically what happens when you go beyond the "management block" size of 128MB or whatever? Do you know how/why you were able to avoid hitting a busy delay?

The buffers are written to SD in the background. At least 82 data points are taken during the write of an SD block.

Right, but that prevents using ADC noise reduction mode which is required to get (nominal) 10b accuracy from the Atmel chip. This scheme seems fine for your 8bit sampling but you have to chose between higher resolution ADC and jitter.

Unless I'm missing something you can't sleep the rest of the chip to gain full accuracy off the ADC if you're running SPI to the SD.

A full spec ADC read takes 13.5 cycles of ADC clock , to get full spec from the onboard ADC that needs to be <200Khz ie 128kHz on Ard. : min full precision convertion = 106us

hmm, perhaps sleeping longing enough for SPI unit to finish transfering the last byte it was sent before triggering ADC would avoid breaking the transfer?

fat16lib:

Doesn't that limit you to 1kS/s unless you are willing to accept some substantial jitter?

The jitter for the 100,000 sample per second logger is a less than one CPU cycle, which is 62.5 ns.

FYI - with 62.5ns jitter and 50kHz input frequency your SNR cannot be better than 40.16db 34dB.

40dB? He's only sampling 8bit (or 10b in conditions that will only give 8b accuracy). Though interleaving ADC and SD writing at this speed is pretty good going for an Arduino. Kudos.

I need 10b accuracy and am looking for speed in order to over sample and do some filtering since a 10b sample is never 10b accurate.

I could reduce random noise by averaging but if I have some cyclic interference to filter out jitter becomes a worry.

The ENOB and other statistical evaluations depend upon sampling a smoothly varying signal on one channel. This will not reflect the accuracy of measurments done by muxing adjacent channels where successive values could potentially change from one extreme to the other.

How do you calculate the s/n effect of jitter? Do you have a reference ?

Specifically what happens when you go beyond the "management block" size of 128MB or whatever? Do you know how/why you were able to avoid hitting a busy delay?

The busy delay has nothing to do with "management block". The big delay happens in single block mode because the controller does not plan ahead. For the 12th time I use multi-block streaming mode.

Yes, but have you tested >5MB to see whether you are still getting no busy time?

Yes, I designed SdFat for audio recording and other high speed logging. I have logged for hours in real apps.

Limor Fried asked me to make a version for beginning users. The Arduino company decided it was too complex and wrapped it with their SD.h API. That's why you think I designed SdFat for beginning user.

You can used just the files Sd2Card.h, Sd2Card.cpp and SdInfo.h as a library. This sketch takes 2120 bytes of flash and will write block zero of an SD.

#include <Sd2Card.h>
Sd2Card card;
uint8_t buf[512];
void setup() {
  card.init();
  card.writeBlock(0, buf);
}
void loop() {

ADC noise reduction mode which is required to get (nominal) 10b accuracy from the Atmel chip. This scheme seems fine for your 8bit sampling but you have to chose between higher resolution ADC and jitter.

Forget the datasheet, it is a general guide. Look at the AVR evaluation tests. In the papers that I pointed to, the ADC is triggered by the CPU clock and noise reduction is done using the DIDR.

The result is 7.4 ENOB for the 2MHz rate used at 100,000 samples per second. For 10-bit sampling 33 ksps gives a ENOB of about 9.3 with a 500kHz ADC clock. The max ENOB is 9.5 for the AVR ADC in any test.

Here is a paper on SNR due to sampling jitter http://www.analog.com/static/imported-files/tutorials/MT-200.pdf.

Clearly the jitter in the AVR timer compare event is less that a CPU cycle. I said it was less than a CPU cycle since I don't know the exact number.

ardnut, If you are over sampling, why write all the data to the SD? What kind of signal are you recording? How can you possibly use the AVR ADC for a fast multi-channel signal?

Forget the datasheet, it is a general guide. Look at the AVR evaluation tests. In the papers that I pointed to, the ADC is triggered by the CPU clock and noise reduction is done using the DIDR.

As an engineer "forget the datasheet" is not part of my way of working. It is not a general guide , it's the bible. Evaluation tests like the audio site your cribbed the graph from are valuable sources of information. They do not negate the datasheet. In fact they do not even contradict it since they are not giving the same information.

I get the impression you are confusing a statistical measurement , ENOB, with the spec for the accuracy on one ADC conversion. The two results are compatible and not contradictory. They are different things.

I am logging physical quantities, not audio. You probably have different criteria for what you are doing. For you, jitter is probably more important than ADC accuracy. In my case it's somewhat the opposite. I require high absolute accuracy; the exact microsecond timing of each datum is less important (other than the question of possibly filtering cyclic interference I mentioned earlier which is more likely 50Hz than 50kHz).

I do not need the root sum error over a few thousand readings , I need to know the accuracy of one conversion. ENOB , S/N etc are useful supplements that help in looking at the S/N of the real signal vs the S/N of the ADC and what measures I need to take to get within spec.

and noise reduction is done using the DIDR.

DIDR is just one measure to reduce internal noise. That is no way the same thing as using ADC noise reduction mode, both are required to attain get the specified accuracy. (See the spec sheet for detains :wink: )

Are you able to comment on whether sleeping the CPU would break the SPI streaming or is the protocol robust enough to stand a circa 100us hiatus?