This post is sort of a progress report and more detail on design decisions.
I am not sure what to make of Senso's post. I am not trying to build a 200 MHz scope using progmem and an 8-bit flash ADC.
It is true that if you ignore basics of low noise design with an ADC like the MCP3201 you may reduce the ENB, Effective Number of Bits, to around 5.5. See this design note from Microchip http://ww1.microchip.com/downloads/en/DeviceDoc/adn007.pdf.
I am not assuming an ADC is the source of data and am not dealing with analog design.
I am using MCP320X and MCP330X ADCs as a data source during development. I went to Digi-Key and searched for an ADC that sampled at 100 ksps or greater, serial SPI, DIP or SOIC, and reasonable price. The result was MCP300X, MCP320X, and MCP330X. I didn't limit the search to Microchip.
The MCP300X is a 10-bit ADC at 200 ksps max but since I must read the ADC with bit-bang SPI in an ISR and I can't reach 200 ksps, I decided to go with the 12 and 13 bit 100 ksps ADCs.
I have highly optimized inline bit-bang read functions for the MCP3201, MCP3202, MCP3204/8, and the MCP3301. These functions allow any digital pins to be used, three pins for the MCP3201 and four pins for the rest. The pin numbers must be constants so the compiler can optimize to simple CBI, SBI, SBIC, and SBIS I/O instructions.
The ISR is key to reaching 40,000 samples per second. Currently the ISR take about 16 microseconds per sample. This is 3.5 usec for the ISR prologue/epilogue, 10 usec to read the ADC, 2.5 usec for other stuff.
That leaves about nine microseconds per sample to write to the SD. I store 255 samples in a 512 byte block so that is a total CPU time of about 2000 microseconds to write a block.
You could slow the sample rate and read more data per sample. For example with a MCP3204 you could sample at 10,000 samples per second with four 16-bit values per sample.
Choice of SD is important. SD controllers are very different in how they handle flash. Cheap ones don't multitask. The SD spec allows a max write latency of 250 ms so cheap cards just pause for 100-200 ms occasionally to erase 128KB or 256 blocks. These just won't work in this application.
I have written a SD verify program to screen SD cards. I have found that SanDisk 4GB Ultra II cards have a consistent write time of less than 1000 usec per block. It looks like you could record a 4GB file, the max size for FAT files, without dropping a data point. That is over 13 hours at 40,000 16-bits samples per second.
I think I will use Python for my PC/Mac example programs.
I hope to release a beta of this stuff soon.
I really appreciate everyone's input. It is very helpful.