Playing WAV from SD card - how can I directly read from SDFat buffer ?

After some fiddling around, I have managed to play stereo WAV files from SD card w/o additional hardware besides the SD card adapter on an Arduino Micro (Atmega 32u4). Data rate is 22050 samples /s, audio output uses fast PWM on the 16bit Timer1 and the 10bit Timer4. Audio quality is good enough for normal use. However, to have a continuous data stream to the PWM output, I had to configure two 512 byte buffer arrays in SRAM in addition to the ~ 1000 bytes of buffer that the SDFat library is already using.
This means that for all other variables only 500 bytes are left. This is a severe limitation for all other functions I want to implement, i.e. defining playlists etc. It also means that this method doesn't work on an Atmega 328 , that has only 2048 bytes SRAM. Trying to reduce the buffer size always leads to audible interruptions of the data stream.

Has somebody managed to directly read from the SDFat buffer , w/o using the file.read() function, which is far too slow for single byte transfer at the needed 88 kByte/s ? Any help is greatly appreciated.

Jordi22:
However, to have a continuous data stream to the PWM output, I had to configure two 512 byte buffer arrays in SRAM in addition to the ~ 1000 bytes of buffer that the SDFat library is already using.
This means that for all other variables only 500 bytes are left. This is a severe limitation for all other functions I want to implement, i.e. defining playlists etc. It also means that this method doesn't work on an Atmega 328 , that has only 2048 bytes SRAM. Trying to reduce the buffer size always leads to audible interruptions of the data stream.

Well... now you are noticing the drawback of most AVR microcontrollers: small RAM size. That's something you have to either deal with, or switch to an ATmega2560 (aka Arduino Mega).

Jordi22:
Has somebody managed to directly read from the SDFat buffer , w/o using the file.read() function

No, but you can by modifying the library.

Although it's not recommended to have direct access to that buffer, since it's actually what is called a "disk cache", which could also contain important filesystem data that must not be incorrectly altered.

To modify it for double buffering, maybe it's better to redo (more or less from scratch) the whole filesystem driver. The "low-level access" part is already done even on the included SD library, the class is named Sd2Card; what you'll have to reimplement is what actually takes the 512 bytes of RAM: the volume (partition) and filesystem driver.
To say the least, you need to have knowledge on how a partition table (the "Master Boot Record" version) and a FAT filesystem works.

Why all this? Because unless you (as a regular user) actually wrote those files knowing the "physical" location in the card, sticking to the filesystem rules is mandatory. It's not as simple as just reading the next block, there's a lot going on when a filesystem is involved; the library actually does a good job of turning all that juggling into an easy task (that's the "hidding principle" of object-oriented programming).

Jordi22:
which is far too slow for single byte transfer at the needed 88 kByte/s ?

Have you tried the multiple byte version of read()? It's supposed to be more efficient than the single byte version multiple times:

// This example assumes you're using a double buffer

unsigned int bytesRead = file.read(buffer[whichBuffer], sizeof(buffer[0]));
// Saving the amount of bytes obtained might be a good idea to realize on time when the end of the file has been reached

Also, are you filling the buffers in the main program and playing the samples in an interrupt?

"small RAM size. That's something you have to either deal with, or switch to an ATmega2560 " with 8K of SRAM

Or a 1284P, with 16K of SRAM.

CrossRoads:
Or a 1284P, with 16K of SRAM.

Yeah, pretty much. I mean... anything with 4 KB of RAM or more.

@ Lucario448 : thanks for your thoughts on my problem. You have convinced me that I should not interfere with the inner workings of the SDFat library. As it was hard enough for me to learn to use the library correctly, I don't feel able to write a file management code on my own. So I will try to live with the SRAM limitations - getting the most out of the limited resources of a given MCU is part of the challenge.

  • Yes, I use the file.read(512) instruction in the main loop to fill two alternating buffers of 512 bytes each . The ISR reads 4 bytes ( 16 bit L/R audio ) 22050 times / second and sets the OCRs of Timer1/4 accordingly.

I tried smaller buffer sizes, which always led to audible interruptions of the audio, because then the
time to empty the buffers was shorter than the time needed to refill them from the SD card. The ISR then reads invalid data , which can be heard as noise and clicks at irregular intervals.

Jordi22:
You have convinced me that I should not interfere with the inner workings of the SDFat library.

But that doesn't mean you can't. If you are really sure it doesn't worth the hassle, then leave it as it is.

Jordi22:

  • Yes, I use the file.read(512) instruction in the main loop to fill two alternating buffers of 512 bytes each.

If that's actually the multi byte version, then where did you got that the read speed is just 88 KB/s?

If the port is operating at "full speed", the MCU is running at 16 MHz, and not having to seek for another filesystem-defined "cluster" (a group of data blocks, usually of 8 aka 4096 bytes) within the file; reading 512 bytes should take roughly 300 microseconds plus maybe another 200 more due to interrupts firing, plus yet another 200 more due to data copying from library's cache to your buffer; while an entire buffer is "consumed" in 5805 microseconds (according to your example).
It seems feasible to me, 5805 - 700 = 5105; so let's say the MCU even has 5 milliseconds to spare between loading data and playing samples. However, when we consider filesystem overhead, the free time becomes more like 3 milliseconds; but it's still something thus still doable.

Jordi22:
The ISR reads 4 bytes ( 16 bit L/R audio ) 22050 times / second and sets the OCRs of Timer1/4 accordingly.

Wait how was that? Doesn't seems like on an Arduino Uno or Nano is even possible (timer1 already used for timing and not for PWM, channel B of timer2 is mapped to a SPI pin aka pin 11). Is that actually a Mega or what?

Jordi22:
I tried smaller buffer sizes, which always led to audible interruptions of the audio, because then the
time to empty the buffers was shorter than the time needed to refill them from the SD card. The ISR then reads invalid data , which can be heard as noise and clicks at irregular intervals.

Again: hardware limitations that can only be solved with either higher CPU clock speeds (widens the time gap between playing and loading) or DMA capability (shortens loading times in bulk transfers).

This time gap between playing and loading narrows with smaller buffers or higher data rates (or both combined), so much that there's a certain point of "overlapping"; causing the symptoms you mentioned before.

Thanks again for considering my question. Your deductions are confirmed by my experiences.

The raw physical speed of the SPI connection to the SD card isn't the actual bottleneck ( at 8 MHz clock the theoretical limit would be 1000 kbytes / second ).

It seems that the transfer between the library's buffer and the two buffer arrays only work w/o substantial delay, when one entire 512 byte block can be transferred with each file.read() call. Kind of synchronisation of the data transfer inside of the MCU with the one between SD card and MCU.

Up to now, I only could make it work on the Atmega 32u4, which has 2500 byte of RAM. I'm sure this would work even better on an Atmega 2560, I'll give it a try sometime soon.

Regards

Jordi22:
The raw physical speed of the SPI connection to the SD card isn't the actual bottleneck ( at 8 MHz clock the theoretical limit would be 1000 kbytes / second ).

But, as you said, it's "theoretical" since all the I/O is interrupt-driven; thus requiring the CPU for every single transfered byte.

If you take a look at the signals with an oscilloscope or a logic analyzer, even in bulk transfers you'll always see a pause every 8 clock pulses; this "dead time" is the CPU retrieving the received byte and preparing the next one to transmit.
However, with DMA capability this dead time would be way shorter, probably between two SPI clock cycles or less than one depending of the operating speeds of the peripheral and the overall system.

Jordi22:
It seems that the transfer between the library's buffer and the two buffer arrays only work w/o substantial delay, when one entire 512 byte block can be transferred with each file.read() call. Kind of synchronisation of the data transfer inside of the MCU with the one between SD card and MCU.

Yes, and you could redirect the cache to your buffers; but the problem is when it has to load filesystem data, although not too much since, in the end, it should load back actual file data.

The process goes like this: if another block is required and it is within a filesystem-defined "cluster", then it's just matter of loading the next block/sector.
However, when it's beyond a "cluster" boundary (usually every 4096th byte), you cannot immediately assume the next one will also belong to the same file (fragmentation might occurred); so in order to advance to the actual next cluster (the next few data blocks that actually belong to the working file), the library first has to load (to the designed cache/buffer) the part of the file allocation table (hence the name of this filesystem) that tells it where is the next one, save that position and finally load the first data block/sector of that cluster.

As you can see, this is the slowest part when reading a file; involves loading two blocks and a bit of processing.

Now I'm thinking: redirecting the cache is not that hard, the toughest part is modifing the library to make it work with double buffers, and somehow "trigger" the loading of the next block/cluster. Also this way you may have to manually keep track of the file size; the remaining slack of the last block is not part of the file's content and it should be considered as "random garbage data".