I was wondering if anyone has previously had a similar requirement to do what I am trying to do?
I have a file (27MB to round figures in size) which contains approx 260,000 lines of variable length, colon separated fields.
It has a defined structure as follows :-
As previously mentioned these are maximums...
A snippet of the mentioned file here :-
258316:SO0003:Ysgubor-wen Ho:SO00:51:43.2:3:26.4:203500:300500:W:RH:Rho Cyn Taf:Rhondda,Cynon,Taff:X:01-MAR-1993:I:170:0:0 258317:SN6895:Ysgubor-y-coed:SN68:52:32.5:3:56.3:295500:268500:W:CE:Cered:Ceredigion:X:01-MAR-1993:I:135:0:0 258318:SO0873:Ysgwd-ffordd:SO06:52:21.1:3:20.6:273500:308500:W:PW:Powys:Powys:X:01-MAR-1998:U:136:147:0 258319:SJ1930:Ysgwennant:SJ02:52:51.9:3:11.8:330500:319500:W:PW:Powys:Powys:X:01-MAR-1993:I:125:0:0 258320:SO0537:Ysgwydd Hwch:SO02:52:1.6:3:22.6:237500:305500:W:PW:Powys:Powys:H:21-MAY-2007:U:160:0:0 258321:SO1200:Ysgwydd-gwyn-isaf Fm:SO00:51:41.8:3:16:200500:312500:W:CF:Caer:Caerphilly:FM:01-MAR-1993:I:171:0:0 258322:SO3113:Ysgyrd Fach:SO20:51:48.9:2:59.6:213500:331500:W:MM:Monm:Monmouthshire:H:01-MAR-1993:I:161:0:0 258323:SO3317:Ysgyryd Fawr:SO20:51:51.1:2:57.9:217500:333500:W:MM:Monm:Monmouthshire:H:01-MAR-1993:I:161:0:0 258324:SS5598:Yspitty:SS48:51:40:4:5.4:198500:255500:W:CT:Carm:Carmarthenshire:O:01-MAR-1993:I:159:0:0 258325:SN4826:Yspitty Ifan:SN42:51:55:4:12.2:226500:248500:W:CT:Carm:Carmarthenshire:X:01-MAR-1993:I:146:0:0 258326:SM7923:Ystafelloedd:SM62:51:52:5:12.2:223500:179500:W:PB:Pemb:Pembrokeshire:X:01-MAR-1993:I:157:0:0 258327:SN7608:Ystalyfera:SN60:51:45.7:3:47.4:208500:276500:W:NP:Nth Pt Talb:Neath Port Talbot:O:01-MAR-1993:I:160:0:0
I need to search as quickly as possible, field 3, possibly sub-searched using fields 14 and/or 13....
Clearly this would be an extremely time consuming process to begin at the beginning and search to the end.... Especially if the result was to yield nothing....
To complicate matters further, the file contains characters which do not 'play well' with toupper() and tolower() For example :-
30:NC3249:A' Chèir Ghorm:NC24:58:24.1:4:52:949500:232500:W:HL:Highld:Highland:X:23-JUN-2008:U:9:0:0 31:NG2605:A' Chill:NG20:57:3.5:6:30.7:805500:126500:W:HL:Highld:Highland:O:01-MAR-1993:I:39:0:0 32:NC2105:A' Chìoch:NC20:58:.2:5:1.2:905500:221500:W:HL:Highld:Highland:X:01-MAR-1993:I:15:0:0 33:NC5729:A' Chioch:NC42:58:13.9:4:25.6:929500:257500:W:HL:Highld:Highland:X:01-FEB-1998:I:16:0:0 34:NG8144:A' Chioch:NG84:57:26.3:5:38.5:844500:181500:W:HL:Highld:Highland:H:01-FEB-1998:I:24:0:0 35:NH0509:A' Chioch:NH00:57:8.1:5:12.9:809500:205500:W:HL:Highld:Highland:X:01-AUG-1994:I:33:0:0 36:NH1115:A' Chìoch:NH00:57:11.5:5:7.2:815500:211500:W:HL:Highld:Highland:H:01-MAR-1993:I:34:0:0
The sort order of the file is numerical on field 1 ... ie 1 - 258422, field 2 is random based on field 3 which is alphabetically sorted while all other fields are also random.
Some sort of caseless 'closest match' style search is what I need.
There is no possibility I can break down the file into 'A' 'B' 'C' on field 3 which was my first idea.... :fearful: :astonished: :roll_eyes: :cold_sweat:
I have already spent a significant amount of time on this problem myself, and basically achieved sweet Fanny Adam! Any and all help would most graciously be received and appreciated!!
Any ideas please?
This is one 'small' problem in a MUCH larger overall project I have brewing, further details to be announced once more progress has been made! ;) :D
Regards and thanks,