Go Down

Topic: Searching a 'large' file on SD? (Read 981 times) previous topic - next topic

ghlawrence2000

Aug 21, 2013, 06:56 pm Last Edit: Aug 21, 2013, 07:06 pm by ghlawrence2000 Reason: 1
Hello all!!

I was wondering if anyone has previously had a similar requirement to do what I am trying to do?

I have a file (27MB to round figures in size) which contains approx 260,000 lines of variable length, colon separated fields.

It has a defined structure as follows :-

Int(6):Char(6):Char(60):Char(4):Int(2):Float(3.1):Int(2):Float(3.1):Int(7):Int(7):Char(1):Char(2):Char(20):Char(60):Char(3):Char(11):Char(1):Int(3):Int(3):Int(3)

As previously mentioned these are maximums...

A snippet of the mentioned file here :-

Code: [Select]
258316:SO0003:Ysgubor-wen Ho:SO00:51:43.2:3:26.4:203500:300500:W:RH:Rho Cyn Taf:Rhondda,Cynon,Taff:X:01-MAR-1993:I:170:0:0
258317:SN6895:Ysgubor-y-coed:SN68:52:32.5:3:56.3:295500:268500:W:CE:Cered:Ceredigion:X:01-MAR-1993:I:135:0:0
258318:SO0873:Ysgwd-ffordd:SO06:52:21.1:3:20.6:273500:308500:W:PW:Powys:Powys:X:01-MAR-1998:U:136:147:0
258319:SJ1930:Ysgwennant:SJ02:52:51.9:3:11.8:330500:319500:W:PW:Powys:Powys:X:01-MAR-1993:I:125:0:0
258320:SO0537:Ysgwydd Hwch:SO02:52:1.6:3:22.6:237500:305500:W:PW:Powys:Powys:H:21-MAY-2007:U:160:0:0
258321:SO1200:Ysgwydd-gwyn-isaf Fm:SO00:51:41.8:3:16:200500:312500:W:CF:Caer:Caerphilly:FM:01-MAR-1993:I:171:0:0
258322:SO3113:Ysgyrd Fach:SO20:51:48.9:2:59.6:213500:331500:W:MM:Monm:Monmouthshire:H:01-MAR-1993:I:161:0:0
258323:SO3317:Ysgyryd Fawr:SO20:51:51.1:2:57.9:217500:333500:W:MM:Monm:Monmouthshire:H:01-MAR-1993:I:161:0:0
258324:SS5598:Yspitty:SS48:51:40:4:5.4:198500:255500:W:CT:Carm:Carmarthenshire:O:01-MAR-1993:I:159:0:0
258325:SN4826:Yspitty Ifan:SN42:51:55:4:12.2:226500:248500:W:CT:Carm:Carmarthenshire:X:01-MAR-1993:I:146:0:0
258326:SM7923:Ystafelloedd:SM62:51:52:5:12.2:223500:179500:W:PB:Pemb:Pembrokeshire:X:01-MAR-1993:I:157:0:0
258327:SN7608:Ystalyfera:SN60:51:45.7:3:47.4:208500:276500:W:NP:Nth Pt Talb:Neath Port Talbot:O:01-MAR-1993:I:160:0:0


I need to search as quickly as possible, field 3, possibly sub-searched using fields 14 and/or 13....

Clearly this would be an extremely time consuming process to begin at the beginning and search to the end.... Especially if the result was to yield nothing....  

To complicate matters further, the file contains characters which do not 'play well' with toupper() and tolower()
For example :-

Code: [Select]
30:NC3249:A' Chèir Ghorm:NC24:58:24.1:4:52:949500:232500:W:HL:Highld:Highland:X:23-JUN-2008:U:9:0:0
31:NG2605:A' Chill:NG20:57:3.5:6:30.7:805500:126500:W:HL:Highld:Highland:O:01-MAR-1993:I:39:0:0
32:NC2105:A' Chìoch:NC20:58:.2:5:1.2:905500:221500:W:HL:Highld:Highland:X:01-MAR-1993:I:15:0:0
33:NC5729:A' Chioch:NC42:58:13.9:4:25.6:929500:257500:W:HL:Highld:Highland:X:01-FEB-1998:I:16:0:0
34:NG8144:A' Chioch:NG84:57:26.3:5:38.5:844500:181500:W:HL:Highld:Highland:H:01-FEB-1998:I:24:0:0
35:NH0509:A' Chioch:NH00:57:8.1:5:12.9:809500:205500:W:HL:Highld:Highland:X:01-AUG-1994:I:33:0:0
36:NH1115:A' Chìoch:NH00:57:11.5:5:7.2:815500:211500:W:HL:Highld:Highland:H:01-MAR-1993:I:34:0:0


The sort order of the file is numerical on field 1 ... ie 1 - 258422, field 2 is random based on field 3 which is alphabetically sorted while all other fields are also random.

Some sort of caseless 'closest match' style search is what I need.

There is no possibility I can break down the file into 'A'  'B'  'C' on field 3 which was my first idea....  :smiley-eek-blue: :smiley-eek: :smiley-roll-blue: :smiley-roll-sweat:

I have already spent a significant amount of time on this problem myself, and basically achieved sweet Fanny Adam! Any and all help would most graciously be received and appreciated!!

Any ideas please?

This is one 'small' problem in a MUCH larger overall project I have brewing, further details to be announced once more progress has been made!  ;) :D

Regards and thanks,

Graham

AWOL

speed safety cameras?
Would it be simpler to reorganise the data and have separate index files, based on place-name/lat-long/ whatever?
"Pete, it's a fool looks for logic in the chambers of the human heart." Ulysses Everett McGill.
Do not send technical questions via personal messaging - they will be ignored.

tylernt

#2
Aug 23, 2013, 09:07 pm Last Edit: Aug 23, 2013, 09:42 pm by tylernt Reason: 1
EDIT: It's a "binary" search. This is regular C code for searching an array, but could easily be adopted to work with a file on an SD card on an Arduino:

http://www.c.happycodings.com/Sorting_Searching/code3.html

Go Up