ckelsoe
Member
I need some help figuring out the best way to parse a file. An example of the data is as follows:
Notes: 1. Some of the fields may have no data.
2. There is a undetermined number of iterations per file
This is from a .fml file which is some form of xml type of data. Other than the example above, it consist of <field name="SomeField">Some Data here</field> tags.
I could not figure out how to do this via XMLreader, etc. so just started off parsing the file as if it were a text file. My mess of code is as follows:
Any pointers would be appreciated. I have to parse through about 20000 files like this.
Regards, Charles
VB.NET:
<iteration DateAdded="12/29/2005">
<field name="Field1"/> 'No data in this field
<field name="Field2">Some Data here</field>
...
</iteration>
<iteration DateAdded="12/29/2005">
<field name="Field1"/> 'No data in this field
<field name="Field2">Some Data here</field>
...
</iteration>
<iteration DateAdded="12/29/2005">
<field name="Field1"/> 'No data in this field
<field name="Field2">Some Data here</field>
...
</iteration>
...
Notes: 1. Some of the fields may have no data.
2. There is a undetermined number of iterations per file
This is from a .fml file which is some form of xml type of data. Other than the example above, it consist of <field name="SomeField">Some Data here</field> tags.
I could not figure out how to do this via XMLreader, etc. so just started off parsing the file as if it were a text file. My mess of code is as follows:
VB.NET:
[SIZE=2][COLOR=#0000ff]Sub[/COLOR][/SIZE][SIZE=2] ParseFile()
[/SIZE][SIZE=2][COLOR=#0000ff]Try
[/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]Dim[/COLOR][/SIZE][SIZE=2] file_name [/SIZE][SIZE=2][COLOR=#0000ff]As[/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]String[/COLOR][/SIZE][SIZE=2] = [/SIZE][SIZE=2][COLOR=#800000]"d:\14839980.fml"
[/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]Dim[/COLOR][/SIZE][SIZE=2] stream_reader [/SIZE][SIZE=2][COLOR=#0000ff]As[/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]New[/COLOR][/SIZE][SIZE=2] IO.StreamReader(file_name)
[/SIZE][SIZE=2][COLOR=#0000ff]Dim[/COLOR][/SIZE][SIZE=2] sFileContent [/SIZE][SIZE=2][COLOR=#0000ff]As[/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]String[/COLOR][/SIZE][SIZE=2] = stream_reader.ReadToEnd
[/SIZE][SIZE=2][COLOR=#0000ff]Dim[/COLOR][/SIZE][SIZE=2] sResults [/SIZE][SIZE=2][COLOR=#0000ff]As[/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]String[/COLOR][/SIZE][SIZE=2] = [/SIZE][SIZE=2][COLOR=#800000]"Results of Parse"[/COLOR][/SIZE][SIZE=2] & vbCrLf & [/SIZE][SIZE=2][COLOR=#800000]"----------------------------"[/COLOR][/SIZE][SIZE=2] & vbCrLf
[/SIZE][SIZE=2][COLOR=#008000]' Get the company name
[/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]Dim[/COLOR][/SIZE][SIZE=2] i5BIS0101 [/SIZE][SIZE=2][COLOR=#0000ff]As[/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]Integer[/COLOR][/SIZE][SIZE=2] = sFileContent.IndexOf([/SIZE][SIZE=2][COLOR=#800000]"5BIS0101"[/COLOR][/SIZE][SIZE=2])
[/SIZE][SIZE=2][COLOR=#0000ff]Dim[/COLOR][/SIZE][SIZE=2] iEndOf5BIS0101 [/SIZE][SIZE=2][COLOR=#0000ff]As[/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]Integer[/COLOR][/SIZE][SIZE=2] = sFileContent.IndexOf([/SIZE][SIZE=2][COLOR=#800000]"</field>"[/COLOR][/SIZE][SIZE=2], i5BIS0101)
[/SIZE][SIZE=2][COLOR=#0000ff]Dim[/COLOR][/SIZE][SIZE=2] sCompanyName [/SIZE][SIZE=2][COLOR=#0000ff]As[/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]String[/COLOR][/SIZE][SIZE=2] = sFileContent.Substring(i5BIS0101 + 10, iEndOf5BIS0101 - i5BIS0101 - 10)
sResults = sResults & sCompanyName & vbCrLf
[/SIZE][SIZE=2][COLOR=#008000]' Dig for Cert Holders
[/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]Dim[/COLOR][/SIZE][SIZE=2] sIterationSearchString [/SIZE][SIZE=2][COLOR=#0000ff]As[/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]String[/COLOR][/SIZE][SIZE=2] = [/SIZE][SIZE=2][COLOR=#800000]"<iteration DateAdded"
[/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]Dim[/COLOR][/SIZE][SIZE=2] iIteration [/SIZE][SIZE=2][COLOR=#0000ff]As[/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]Integer[/COLOR][/SIZE][SIZE=2] = 0 [/SIZE][SIZE=2][COLOR=#008000]'Which char pos is Iteration starting at?
[/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]Dim[/COLOR][/SIZE][SIZE=2] sFieldNameSearchString [/SIZE][SIZE=2][COLOR=#0000ff]As[/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]String[/COLOR][/SIZE][SIZE=2] = [/SIZE][SIZE=2][COLOR=#800000]"<field name="
[/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]Dim[/COLOR][/SIZE][SIZE=2] iStartValuePadding [/SIZE][SIZE=2][COLOR=#0000ff]As[/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]Integer[/COLOR][/SIZE][SIZE=2] = 21
[/SIZE][SIZE=2][COLOR=#0000ff]Dim[/COLOR][/SIZE][SIZE=2] iStartFieldPadding [/SIZE][SIZE=2][COLOR=#0000ff]As[/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]Integer[/COLOR][/SIZE][SIZE=2] = 12
[/SIZE][SIZE=2][COLOR=#0000ff]Dim[/COLOR][/SIZE][SIZE=2] iNextFieldLocation [/SIZE][SIZE=2][COLOR=#0000ff]As[/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]Integer[/COLOR][/SIZE][SIZE=2] = 0
[/SIZE][SIZE=2][COLOR=#0000ff]Dim[/COLOR][/SIZE][SIZE=2] sEndofFieldString [/SIZE][SIZE=2][COLOR=#0000ff]As[/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]String[/COLOR][/SIZE][SIZE=2] = [/SIZE][SIZE=2][COLOR=#800000]"/"
[/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]Dim[/COLOR][/SIZE][SIZE=2] iEndOfFieldLocation [/SIZE][SIZE=2][COLOR=#0000ff]As[/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]Integer[/COLOR][/SIZE][SIZE=2] = 0
[/SIZE][SIZE=2][COLOR=#0000ff]Dim[/COLOR][/SIZE][SIZE=2] sFieldName [/SIZE][SIZE=2][COLOR=#0000ff]As[/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]String[/COLOR][/SIZE][SIZE=2] = [/SIZE][SIZE=2][COLOR=#800000]""
[/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]Dim[/COLOR][/SIZE][SIZE=2] sFieldValue [/SIZE][SIZE=2][COLOR=#0000ff]As[/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]String[/COLOR][/SIZE][SIZE=2] = [/SIZE][SIZE=2][COLOR=#800000]""
[/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]Dim[/COLOR][/SIZE][SIZE=2] iEndOfFile [/SIZE][SIZE=2][COLOR=#0000ff]As[/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]Integer[/COLOR][/SIZE][SIZE=2] = sFileContent.IndexOf([/SIZE][SIZE=2][COLOR=#800000]"</fml>"[/COLOR][/SIZE][SIZE=2])
[/SIZE][SIZE=2][COLOR=#0000ff]Dim[/COLOR][/SIZE][SIZE=2] iLength [/SIZE][SIZE=2][COLOR=#0000ff]As[/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]Integer[/COLOR][/SIZE][SIZE=2] = sFileContent.Length - 8[/SIZE]
[SIZE=2]
[/SIZE][SIZE=2][COLOR=#0000ff]Do[/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]Until[/COLOR][/SIZE][SIZE=2] iIteration = -1
[/SIZE][SIZE=2] iIteration = sFileContent.IndexOf(sIterationSearchString, iIteration)
iNextFieldLocation = sFileContent.IndexOf(sFieldNameSearchString, iI teration + 1)
iEndOfFieldLocation = sFileContent.IndexOf(sEndofFieldString, iNextFieldLocation)
sFieldName = sFileContent.Substring(iNextFieldLocation + iStartFieldPadding, 10)
sFieldValue = sFileContent.Substring(iNextFieldLocation + iStartValuePadding, 2)
sResults = sResults & sFieldName & [/SIZE][SIZE=2][COLOR=#800000]" - "[/COLOR][/SIZE][SIZE=2] & sFieldValue & vbCrLf
[/SIZE][SIZE=2][COLOR=#0000ff]Loop
[/COLOR][/SIZE][SIZE=2]txtFileContents.Text = sResults
[/SIZE][SIZE=2][COLOR=#008000]
[/COLOR][/SIZE][SIZE=2]txtFileContents.Select(0, 0)
stream_reader.Close()
[/SIZE][SIZE=2][COLOR=#0000ff]Catch[/COLOR][/SIZE][SIZE=2] exc [/SIZE][SIZE=2][COLOR=#0000ff]As[/COLOR][/SIZE][SIZE=2] System.IO.FileNotFoundException
[/SIZE][SIZE=2][COLOR=#008000]' Ignore this error.
[/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]Catch[/COLOR][/SIZE][SIZE=2] exc [/SIZE][SIZE=2][COLOR=#0000ff]As[/COLOR][/SIZE][SIZE=2] Exception
[/SIZE][SIZE=2][COLOR=#008000]' Report other errors.
[/COLOR][/SIZE][SIZE=2]MsgBox(exc.Message, MsgBoxStyle.Exclamation, [/SIZE][SIZE=2][COLOR=#800000]"Read "[/COLOR][/SIZE][SIZE=2] & _
[/SIZE][SIZE=2][COLOR=#800000]"Error"[/COLOR][/SIZE][SIZE=2])
[/SIZE][SIZE=2][COLOR=#0000ff]End[/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]Try
[/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]End[/COLOR][/SIZE][SIZE=2][COLOR=#0000ff]Sub
[/COLOR][/SIZE]
Any pointers would be appreciated. I have to parse through about 20000 files like this.
Regards, Charles