Strip out text blocks from text file

rabbie

New member
Joined
Aug 10, 2008
Messages
2
Programming Experience
Beginner
Hello,

I have a large text file (about 2 GB) contains many records. In each record, fields are separated by tabs, at the 45th tab, there is a text block (call Message), I need to strip it out to write to a separated text file, and then the 48th tab with a CRLF ends the record. For example, if I have five records in this text file (abc.txt), with record ID 01, 02, 03, 04 and 05, the output should be:

abc_withoutTextBlock.txt -- only remove the text block (Message field) in each record, all other field will be keep.
01.txt -- 01 record's Message field
02.txt -- 02 record's Message field
03.txt -- 03 record's Message field
04.txt -- 04 record's Message field
05.txt -- 05 record's Message field


I cannot use StreamReader.Readline to get each record because the Message filed in each record contains tabs and CRLF.

How should I do this? Your code sample and direction would be much appreciated.

Thanks in advanced,
Rabbie
 
Pray tell, if the message field contains tabs and CRLF, and this is a text file where the fields are delimited by tab and records delimited by CRLF,how on earth do you propose to tell where records and fields start and end?
 
Oh, each record starts with a standard ID:

First one is 12345 (tab) 01 (tab) ... (tab) ... (tab) text block (tab) ... (tab) CRLF
Second one is 12345 (tab) 02 (tab) ... (tab) ...(tab) text block (tab) ... (tab) CRLF
Thrid one is 12345 (tab) 03 (tab) ... (tab) ...(tab) text block (tab) ... (tab) CRLF
Fourth one is 12345 (tab) 04 (tab) ... (tab) ...(tab) text block (tab) ... (tab) CRLF
Fifth one is 12345 (tab) 05 (tab) ... (tab) ...(tab) text block (tab) ... (tab) CRLF

Thanks,
Rabbie
 
Maybe you misunderstand. You have also stated that your "text block" contains CRLF and TAB characters. So your file actually looks like:

First one is 12345 (tab) 01 (tab) ... (tab) ... (tab) text (tab) text (tab) blah CRLF
text (tab) more text (tab) CRLF
last bit (tab) of text block
(tab) ... (tab) CRLF


So answer my question.. How do you propose to parse this line as ONE record of 48 tabs NOT THREE records of 47, 2 and 3 records respectively
 
Back
Top