Question Parsing Logic

Bandolier

Member
Joined
Jun 30, 2009
Messages
5
Programming Experience
Beginner
I am working on a design for a bit of code to parse a single, very large, CSV file into two different files.

The data has a reference code and a phone number like such:

101,5551112220
101,5551112225
101,5551112226
101,5551112227
102,5551112232
102,5551112233
103,5551112234
103,5551112235
103,5551112236
104,5551112240
101,5551112251
102,5551112252

What I need to do is break this data down by Reference Code and the Left(9) of the Phone number.

I suppose I could feed it into access and build tables but the requirements are more than I think a query will handle and it requires comparing to other records values.

The requirements are:

- Split by the left(9) of the phone number, take the lead number in that range and put that record into File #1.
- Put the next X records in that range in File #2 IF they have the same Reference Code.
- If the next X records in that range do not have the same Reference code as the first one, place them ALL into File #1. Meaning once a single record doesn't match in that range, everything that follows in the same Phone number range is placed into File #1.
- If the record is the ONLY one in that range, place it into File #1 and move on to the next range. This should be accounted for in the above loops but wanted to call it out specifically.

The output would be two files, one containing all the leading records and what I call the "problem" records; and the second file containing all the other records that are in the same range and have the same Ref Code as the Leading record.

I have some ideas but I wanted to bounce this off the community at large to see if anyone has done something similar before I get to coding, as I am just working on paper currently.

Paper is easier to throw than my monitor. ;)

Thanks in advance.
 
Mind expanding your example to show what you'd want in File 1 and what you'd want in File 2 from the set of numbers you've provided?
 
Insert the Snippet 'Read a Delimited Text File' (fundamentals, file system) and go from there. Use one StreamWriter for each output file.
 
re: Question Parsing Logic

Mind expanding your example to show what you'd want in File 1 and what you'd want in File 2 from the set of numbers you've provided?

Sure with some expanded input here are examples of the input and the output I'm hoping to acheive.

Input File:
101,5551112220
101,5551112225
101,5551112226
101,5551112227
102,5551112232
102,5551112233
103,5551112234
103,5551112235
103,5551112236
104,5551112240
101,5551112251
102,5551112252
105,5551112260
105,5551112261
105,5551112262
105,5551112263
105,5551112264
105,5551112265
105,5551112266
105,5551112267
105,5551112268
105,5551112269
106,5551112270

Output File 1:
101,5551112220
102,5551112232
103,5551112234
103,5551112235
103,5551112236
104,5551112240
101,5551112251
102,5551112252
105,5551112260
106,5551112270

Output File 2:
101,5551112225
101,5551112226
101,5551112227
102,5551112233
105,5551112261
105,5551112262
105,5551112263
105,5551112264
105,5551112265
105,5551112266
105,5551112267
105,5551112268
105,5551112269

My thought was to set the left(9) of the Phone Number and the Reference as variables and then loop through each record to compare. Subsequent records which matched both the left(9) and Reference would be written to File #2.

If a record has a different Reference but the same left(9) would be written to file 1.

When a new left(9) was encountered rinse and repeat.

My concern is that there are about 200,000 records over 100 files to go through and optimizing the routine will be a key.

Thanks for the replies!
 
Using the snippet JohnH referenced in his post.

VB.NET:
		Dim usedValues As New List(Of String)
		Dim filename As String = "C:\Temp\TheDocument.txt"

		Using swFile1 As New IO.StreamWriter("C:\Temp\File1.txt"), _
		  swFile2 As New IO.StreamWriter("C:\Temp\File2.txt"), _
		  parser As New TextFieldParser(filename)
			Dim fields As String()
			Dim delimiter As String = ","
			parser.SetDelimiters(delimiter)
			While Not parser.EndOfData
				' Read in the fields for the current line
				fields = parser.ReadFields()
				' Add code here to use data in fields variable.
				Dim match As String = fields(0) & fields(1).Substring(0, 9)

				If usedValues.Contains(match) Then
					swFile2.WriteLine(String.Join(",", fields))
				Else
					swFile1.WriteLine(String.Join(",", fields))
					usedValues.Add(match)
				End If
			End While
		End Using
 
Ok one more thing, after running this code, I noticed in my sample data that I have an error in the way I split it out... and eliminating the human error is EXACTLY why I wanted to script this out.

Looking at this subset:

102,5551112232
102,5551112233
103,5551112234
103,5551112235
103,5551112236

File 1 should actually end up with:

102,5551112232
103,5551112234
103,5551112235
103,5551112236

File 2:
102,5551112233

The (Left,9) are all 555111223 but because the Reference Number changes on the third row in the series, that record and all the subsequent records should end up in file 1 until the next (Left,9) is encountered.

Thanks again.
 
Think I've got it now. I added second list to track just the Phone Numbers that have also been seen, so if the record is an exact match for the Ref,Phone Number -or- We've seen your phone number range come up before, the record is written to File1, everything else goes to File2 and then both lists are updated.

Again, the assist is much appreciated.
 
Back
Top