The index file would be 32-bit file positions of the beginning of each text word in sorted order. The first value would be the file seek to the first word in sorted order. Multiple occurrences of the same word would all get indexes.
I've done this with limited sets of serial input text buffered in RAM, I sort indexes in RAM not the text and the sort is done as fast as the data arrives plus a few micros. I can do that on SD but not as fast while not buffering the text to be sorted, I only keep the links being sorted in RAM.
The idea is to read the text file and make sorted links to that set in RAM then write those links out as a temp index file. Next batch starts reading where the last quit and the links from that merged with the last set, a 1-pass interleave sort outputs a new temp index, repeat until done, naming the output index something not-temp.
It would read the delimited serial text file and make a sorted index file into the original.