The way I understand, when it is flashing the memory it can't read incoming data. How do I know this ? I'm missing something here. Does the SD function deal with this issue?
There are many layers of buffering in the SD library and in the SD card. There is no way you can tell when the flash in the card is written. This is why I don't do file system writes when recording data like audio.
I wrote the first version of SdFat over six years ago and added features for fast data recording. The Arduino SD.h library is just a wrapper for an old version of SdFat so it works the same way.
If you do file system writes, the following happens. If the current write starts a new cluster, a search of the FAT (File Allocation Table) is done to find a free cluster. This search reads 512 byte blocks from the FAT and when a free cluster is found, the block is updated and written back to the FAT. This can take a long time if the SD is fragmented.
The data you are writing is then copied to an internal 512 byte block buffer in the SD library. When the buffer is full, the block is written to the SD.
The SD has a very complex RISC controller that buffers data and emulates 512 byte flash pages. The actual flash pages are usually some multiple of 16 KiB. Unfortunately the single block writes used for file system writes cause an entire 16 KiB block to be programed so there is lots of data buffering and copying in the SD. This can take a long time.
The the controller also maintains a pool of erased flash. Flash erase groups are usually larger than 128 KiB so this can also cause random delays.
Finally the controller does flash wear leveling so it may decide to move and remap a internal flash regions which will take a long time.
SD cards have a streaming mode that provides a more predictable access. You can inform the SD card controller that you intend to write a large contiguous block of flash and it will optimize access to this block. You transfer successive 512 byte blocks and the SD controller buffers these in RAM until a full flash page is accumulated and then programs the flash.
This is the method used by the AnalogBinLogger example. I create a large contiguous file and do raw multi-block writes to the file using 512 byte block buffers. When I finish logging data, I truncate the file to the correct size.
You might want to look at the RawWrite example. It shows how to create and write to a contiguous file with multi-block writes but does not demonstrate the buffering technique used in the AnalogBinLogger example.
If you have sufficient buffer you may get file writes to work. I think you are using a Mega 2560 or a 2561. The SD spec limits write latency for 512 byte writes to 250 ms. Unfortunately this is 2000 times the 125 μs you were looking for. Many modern cards do better than the 250 ms max.