Pages: [1] 2 3 4   Go Down
Author Topic: Unformatted write to SD  (Read 5742 times)
0 Members and 1 Guest are viewing this topic.
Offline Offline
Full Member
***
Karma: 0
Posts: 179
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

I was planning on writing a trimmed down SD lib that just writes to the device without using a file system.

I would probably limit it to one SD class and read would not be needed.

The main aim is to get the RAM usage in the lib down to a minimum to save it for data.

The sketch would be a fast dedicated ADC sample and logger.

There would be less software overhead on the write as well although the difference would not probably matter w.r.t. the SD card write speed limitations.

Less code as well but this is not critical.

Has anyone seen this done already?


PS since someone's going to ask anyway, to recover data on linux:

dd if=/dev/sdd1 of=/tmp/adc.data bs=512


« Last Edit: August 14, 2012, 08:43:37 am by ardnut » Logged

0
Offline Offline
Edison Member
*
Karma: 44
Posts: 1484
Arduino rocks
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

You must write an entire block at a time.  You must send all 512 bytes of data and two dummy or CRC bytes in one SPI transfer.

You can't implement a fast logger this way since the SD may present a busy delay of up to 100 ms when you send a single block write command.

Only streaming block write commands are fast.  See AnalogIsrLogger20120810.zip http://code.google.com/p/beta-lib/downloads/list.  It can do 100,000 samples per second.

You can't log faster with an external ADC since you can't used the SPI bus with the ADC and SD.  The SD requires the block write to be done as a single SPI transfer.

I have a very fast Software SPI routine but it only runs at 2MHz so about 40,000 samples per second is the limit for the sketch I posted at the above site for fast logging with an external ADC.
Logged

Offline Offline
Edison Member
*
Karma: 9
Posts: 1010
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

http://arduino.cc/forum/index.php?topic=98898.0
Logged

0
Offline Offline
Edison Member
*
Karma: 44
Posts: 1484
Arduino rocks
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Writing to the SD is not as much the problem as acquiring the data.  

You must use multi-block streaming write commands.  I have posted many sketches and there is a RawWrite example in my latest version of SdFat.

Data is worthless for most measurements unless it is taken at precise time intervals.

The quality of a logger is often determined by logging a pure sine wave and then doing a FFT.  You won't get good results if there is time jitter or too much noise.

Quote
Effective number of bits (ENOB) is derived from an FFT analysis of the ADC output when the ADC is stimulated with a full-scale sine-wave input signal. The root-sum-of-squares (RSS) value of all noise and distortion terms is computed, and the ratio of the signal to the noise-and-distortion is defined as SINAD, or S/(N+D).
http://www.analog.com/static/imported-files/tutorials/MT-003.pdf

http://www.analog.com/library/analogDialogue/archives/40-02/adc_noise.html

I suggest more thought on acquisition.
« Last Edit: August 14, 2012, 05:26:22 pm by fat16lib » Logged

Offline Offline
Full Member
***
Karma: 0
Posts: 179
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

thanks for you comments. I did not realise all you had provided initially. I'm working through it now, but don't have any SD hardware to test on yet.

fat16lib:
Quote
You can't implement a fast logger this way since the SD may present a busy delay of up to 100 ms when you send a single block write command.

Is that from the spec or is it an experimental finding?

I'd don't see why the SD would cause such a delay if all the required blocks have been pre-erased.

Is there some other internal firmware functionalty that could be coming into play?







Logged

0
Offline Offline
Edison Member
*
Karma: 44
Posts: 1484
Arduino rocks
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

It's from the spec and experiment.  I was shocked years ago when I wrote my first SD library.

Actually if the maximum write latency is only 100 ms you are lucky.  Here is the ugly spec:
Quote
4.6.2.2 Write
For a Standard Capacity SD Memory Card, the times after which a timeout condition for write operations occurs are (card independent) either 100 times longer than the typical program times for these operations given below or 250 ms (the lower of the two). The R2W_FACTOR field in the CSD is used to calculate the typical block program time obtained by multiplying the read access time by this factor. It applies to all write commands (e.g. SET(CLR)_WRITE_PROTECT, PROGRAM_CSD and the block write commands).

High Capacity SD Memory Card and Extended Capacity SD Memory Card indicate R2W_FACTOR as a fixed value.  In case of High Capacity SD Memory Card, maximum length of busy is defined as 250ms for all write operations.

So even if the card has been erased, the spec allowed a card to have long busy periods occasionally while programming flash.  

I have very little information what the cards controller is doing during this time.  Most of the details about the card controller are a manufacture's trade secret.  I suspect it is due to a wear-leveling operation.  the card won't program the block even though it has been erased.  wear-leveling happens for very large areas like 128 KB.  This requires a huge copy.  In streaming mode you tell the card what blocks will be written so the controller can plan ahead.

I have experimented with about 40 cards and find that many standard (2GB or less) cards maintain low write latency if you use multi-block streaming mode and space writes at even intervals.  

Many better SanDisk cards perform well in this mode.  Here is a benchmark at 500 blocks per second for a 2GB Extreme III card:
Quote
Start raw write of 5120000 bytes at
256000 bytes per second
Please wait 20 seconds
Done
Elapsed time: 20000 millis
Max write time: 828 micros
Overruns: 0
The max time for a 512 byte write was 828 usec.

Block mode write and SPI are available for compatibility with the spec but may have very poor performance.

SD card are designed to be cheap and assume that devices like video cameras have a large amount of buffering so occasional long write latency is OK.  Class 10 card performance is based on the average rate for writing many MB of data.

The new SDXC cards can be busy for up to 500 ms.
« Last Edit: August 15, 2012, 07:20:05 pm by fat16lib » Logged

Offline Offline
Full Member
***
Karma: 0
Posts: 179
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

probably the hardest hit block will be block 0 with the FAT and partition table.

The overwhelming assumption will be that these devices are used with the installed FAT fs. The firmware will almost certainly need to map this block to different locations from time to time based on unknown and different design criteria.

Each time a FAT based file is written the FAT will get updated. In the hope that the firmware is actually counting FAT updates (probalby buffering them as well) I would guess that avoiding using block 0 and (by implication) not using a FAT based fs , it may be possible to avoid these busy delays.

I'm still waiting for my Teensy and its SD so I'll just ask what you think at this stage.

Have you tried writing to SD in raw mode , avoiding any fs and skipping block 0 ?

Thanks for the tip about staying with the the 2GB limit.

Logged

0
Offline Offline
Edison Member
*
Karma: 44
Posts: 1484
Arduino rocks
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Writing with no FS will not help.  Avoiding some block will not help. 

You can't solve the problem by speculation.  Remember, "One fact is worth more than a 1000 speculations".

The SD standard has lots of alignment requirements for how to format the SD so that performance will be optimal.  If you use the SD Association's formatter or my formatter, the SdFat SdFormatter.ino example, file structures will be properly aligned with erase groups.

You can't guess what the best policy will be.  I have spent days trying and every card is different and the behavior varies with card use.

Only two things seem matter and the big thing is the SD card controller.  If you have a good controller, you must use multi-block write and selecting write with pre-erase seems to help.

I added the ability to quickly create a large contiguous file to SdFat.  Doing raw writes to these files is just as good as having no FS and access is easier on other computers.  You have more flexibility than using something like dd.  Multiples regions on one SD become a pain with dd.

Unfortunately cards with really good performance are no longer being manufactured.  Cards that look the same have different controllers.

My best card was manufactured in 2007 and is a 2GB SanDisk Extreme III.  This is version 8.0 of this model card.

The standard SD card (cards with 2GB or less) are being phased out. 

Some SDHC cards perform fairly well with Arduino but again cards of the same model vary depending on the card version.

I have had good luck with some 4GB SanDisk Extreme cards.

Here are two examples, a 2GB card and a 4GB card.  Notice that block groups on the 2GB card are much smaller than the 4GB card.  32 blocks vs 128 blocks.  Also alignment of the FS partition on the 4GB card has a big unused space before the partition.  The 2GB card has a smaller space before the FAT partition.

2GB Extreme III card:
Quote
Manufacturer ID: 0X3
OEM ID: SD
Product: SD02G
Version: 8.0
Serial number: 395023392
Manufacturing date: 11/2007

cardSize: 3970048 (512 byte blocks)
flashEraseSize: 32 blocks
eraseSingleBlock: true

SD Partition Table
part,boot,type,start,length
1,0X0,0X6,249,3969799
4GB Extreme HD Video card:
Quote
Manufacturer ID: 0X3
OEM ID: SD
Product: SD04G
Version: 8.0
Serial number: 3027274498
Manufacturing date: 4/2011

cardSize: 7744512 (512 byte blocks)
flashEraseSize: 128 blocks
eraseSingleBlock: true

SD Partition Table
part,boot,type,start,length
1,0X0,0XB,8192,7736320
Logged

Offline Offline
Full Member
***
Karma: 0
Posts: 179
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
You can't solve the problem by speculation.  Remember, "One fact is worth more than a 1000 speculations".

I totally agree but when dealing with black-box situations like undocumented and commercially secret firmware sometimes that's the only option left. Though I would categorise it as hypothesising and hypothesis testing rather than speculation.

I thank you for your comments. You have clearly done a lot of research on this problem. I'm just trying to find an angle that you have not thought of yet that may help.

It's not easy to double-guess what the various firmware designers have chosen to do and , as I commented above , this will vary. Your results confirm that.

It seems that the potential 100ms busy time is a bit of a killer for what we are both trying to do. I'm trying to guess what could be happening during that time and whether it is possible to avoid triggering it. The SD standard presumably allows this delay to allow for internal housekeeping/load spreading/encryption etc.

This will almost certainly be hardware specific but understanding the problem will surely help avoid it.

Quote
You can't guess what the best policy will be.  I have spent days trying and every card is different and the behavior varies with card use
Which corroborates my suggestion that there is some load spreading algo intervening here.

Am I right in thinking that the "groups" probably relate to individual physical flash chips within the device?

Quote
Notice that block groups on the 2GB card are much smaller than the 4GB card.  32 blocks vs 128 blocks.

Does a contiguous write that starts and ends in the same (pre-erased) group perform any better than one that runs into two groups? Maybe there is internal buffering and the physical write op only happens at group level, not at the now (physically) theoretical 512k SD block size. There may be a firmware equivalent of sync() that flushes after a certain period or period of inactivity on the SPI.

Quote
Avoiding some block will not help. 
I only suggested avoiding block 0 since it is very likely to get special treatment on a device designed to run with a FAT based fs. From what you describe, I would now suggest avoiding the entire group containing that block. This may coincide with what you say about correct file alignment on format. (I presume you are saying that the first file should start at the next group boundary , rather than in the block following the FAT).

Thanks for sharing the fruits of your investigations. This is crucial stuff for getting good logging performance.

Logged

0
Offline Offline
Edison Member
*
Karma: 44
Posts: 1484
Arduino rocks
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
It seems that the potential 100ms busy time is a bit of a killer for what we are both trying to do.
No, I am perfectly happy with the result I posted above.  Here it is again:
Quote
Start raw write of 5120000 bytes at
256000 bytes per second
Please wait 20 seconds
Done
Elapsed time: 20000 millis
Max write time: 828 micros
Overruns: 0
This means I can write at up to 256 KB/sec and the time to write a block is no greater than 828 usec.  There is no busy delay so about 42% of the CPU time is required.  This program simulates a data logger by writing a block every 2,000 usec in the multi-block mode I described before.

It is very difficult to write 512 bytes from RAM to the SPI bus much faster.

For typical logging applications, most of the CPU time is used acquiring data.  Writing at 100 KB/sec requires less than 20% of the CPU so 80% is available to acquire data.

The above result is for raw writes to a large contiguous file.  The fact the SD is formatted with a file system has no effect.

If you use the same SD for logging with single block writes to a file you get this result for 100 byte writes:
Quote
Type is FAT16
File size 5MB
Buffer size 100 bytes
Starting write test.  Please wait up to a minute
Write 199.21 KB/sec
Maximum latency: 86384 usec, Minimum Latency: 84 usec, Avg Latency: 496 usec
There was at least on busy delay of 86.4 ms with this SD.  The minimum latency, 84 usec, occurs when the write is just a copy to the SdFat block buffer and no write to the SD occurs.

This is one of the best SD cards around for Arduino use and it has almost a 100 ms delay.

The rate was under 200 KB/sec and required 100% CPU.

So what do you expect to achieve? 

Why do you want to use single block writes.  The above test proves that streaming multi-block writes work with good cards.
Logged

0
Offline Offline
Edison Member
*
Karma: 44
Posts: 1484
Arduino rocks
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Wear leveling algorithms are not always a totally black box.  Look at this ]http://www.stec-inc.com/downloads/AN-0702_STEC_SMALL_CARDS_WEAR_LEVELING_LIFETIME_CALCULATOR.pdf].

Tell me how it helps if you can't access the use counts and mappings.

Since every manufacturer has different internal structures and algorithms it is even harder.
« Last Edit: August 17, 2012, 06:05:52 pm by fat16lib » Logged

Offline Offline
Full Member
***
Karma: 0
Posts: 179
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

I don't think you're quite following me. Part of my aim was to remove the RAM needed by the full SDfat lib that I really don't need. I'm not saying that there is no call for what you have done. I think it's excellent work and what most people on Ard probably want.

I'm looking at trimming the fat to keep the most RAM for data. I have 8 channels of 16b data. (This will probably run on Teensy to get the 8 channels). This is dedicated hardware, I have no reason not to dedicate an SD card for it and thus dumping a continuous stream unformatted is no problem. I can handle the rest on Linux later.

I'm also likely to be creating >5MB of data over time , this is a data logger. Hence my interest in how groups relate to what STEC are calling management blocks. If I'm interpreting this correctly, attempting to define a group > 128MB will probably get refused. Also to get the largest contiguous block you will need to know what else is on the fs (ie have it freshly formatted) at which point some of the interest in using an fs is lost.

I think to get the best from any given card will require some specific information about it and adapting the writing cycle to fit. It looks like you libs and examples will provide a lot of useful info off the card.

I agree that your multiblock write within a fs structured file is probably  no different than a raw write of the same size. Any fs related stuff will happen before or after.

You say writing at 100kB/s uses <20% but what is the smallest continuous stream you can output? One block of 512b in just under 1ms. How does that relate to your jitter?

Doesn't that limit you to 1kS/s unless you are willing to accept some substantial jitter?





Logged

0
Offline Offline
Edison Member
*
Karma: 44
Posts: 1484
Arduino rocks
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
Part of my aim was to remove the RAM needed by the full SDfat lib
The 512 byte block cache RAM in SdFat can be used for logging with raw writes.  I use it in my fast loggers.  There is a call that flushes the block cache and returns the address of the cache.  Very little other RAM is globally allocated.
Quote
Also to get the largest contiguous block you will need to know what else is on the fs (ie have it freshly formatted) at which point some of the interest in using an fs is lost.
It's easier to use contiguous files than a raw device.  That's why the POSIX real-time file extensions were developed for RTOSs used in embedded systems.

SdFat allows up to a 4GB contiguous file to be created.  It finds the first fit place.  If you are willing to use an SD as a raw device, you will suffer more pain than formatting the SD.
Quote
I think to get the best from any given card will require some specific information about it and adapting the writing cycle to fit.
Not likely.  Better to spend some money on an industrial SD designed for embedded systems.

Quote
Doesn't that limit you to 1kS/s unless you are willing to accept some substantial jitter?
The jitter for the 100,000 sample per second logger is a less than one CPU cycle, which is 62.5 ns.

I trigger the ADC on a timer1 compare event.  I read the completed conversion in an ISR and buffer it.

The buffers are written to SD in the background.  At least 82 data points are taken during the write of an SD block.

« Last Edit: August 17, 2012, 06:11:45 pm by fat16lib » Logged

0
Offline Offline
Edison Member
*
Karma: 44
Posts: 1484
Arduino rocks
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

I ran the following sketch to check memory use.
Code:
#include <SdFat.h>
#include <SdFatUtil.h>
SdFat sd;
SdFile file;

void setup() {
  if (!sd.begin()) return;
  file.open("SIZE_TST.TXT", O_RDWR | O_CREAT | O_AT_END);
  file.println(FreeRam());
  file.close();
}
void loop() {}

The file contains the value 1369.  So total used RAM is 679 bytes.  Since the 512 byte buffer can be used for logging, total RAM for the Arduino core and other SdFat use is 167 bytes.

Logged

Offline Offline
Full Member
***
Karma: 0
Posts: 179
View Profile
 Bigger Bigger  Smaller Smaller  Reset Reset

Quote
and other SdFat use is 167 bytes.
Thanks, from eyeballing the code I thought it would be more than that.

Quote
SdFat allows up to a 4GB contiguous file to be created.

Yes, but have you tested >5MB to see whether you are still getting no busy time?  Specifically what happens when you go beyond the "management block" size of 128MB or whatever? Do you know how/why you were able to avoid hitting a busy delay?

Quote
The buffers are written to SD in the background.  At least 82 data points are taken during the write of an SD block.

Right, but that prevents using ADC noise reduction mode which is required to get (nominal) 10b accuracy from the Atmel chip.  This scheme seems fine for your 8bit sampling but  you have to chose between higher resolution ADC and jitter.

Unless I'm missing something you can't sleep the rest of the chip to gain full accuracy off the ADC if you're running SPI to the SD.

A full spec ADC read takes 13.5 cycles of ADC clock , to get full spec from the onboard ADC that needs to be <200Khz ie 128kHz on Ard. : min full precision convertion = 106us
« Last Edit: August 18, 2012, 06:08:09 am by ardnut » Logged

Pages: [1] 2 3 4   Go Up
Jump to: