Read XML file with child nodes

bkleen

New member
Joined
Mar 9, 2013
Messages
2
Programming Experience
1-3
I have an xml file that has multiple levels of child nodes. I've successfully read the data to multiple datatables in a dataset but each of the datatables seems to be independent. I want to parse out certain bits of information and consolidate it into one datatable that looks like the following:

XML CODE



HTML:
<Batch>
<BatchInstanceIdentifier>BI53</BatchInstanceIdentifier>
<BatchClassIdentifier>BC3</BatchClassIdentifier>
<BatchClassName>TesseractMailRoom</BatchClassName>
<BatchClassDescription>Training</BatchClassDescription>
<BatchClassVersion>1.0.0.79</BatchClassVersion>
<BatchName>ephesoft1362277230826</BatchName>
<BatchPriority>1</BatchPriority>
<BatchStatus>ReadyForValidation</BatchStatus>
<BatchLocalPath>C:\Ephesoft\SharedFolders\ephesoft-system-folder</BatchLocalPath>
<DocumentClassificationTypes>
<DocumentClassificationType>SearchClassification</DocumentClassificationType>

</DocumentClassificationTypes>


<Documents>
<Document>
<Identifier>DOC1</Identifier>
<Type>Mail</Type>
<Confidence>10.31</Confidence>
<ConfidenceThreshold>20.0</ConfidenceThreshold>
<Valid>true</Valid>
<Reviewed>true</Reviewed>
<ErrorMessage/>
<DocumentLevelFields>
<DocumentLevelField>
<Name>Company</Name>
<Value>Von Maur</Value>
<Type>STRING</Type>
<Confidence>0.0</Confidence>
<FieldOrderNumber>1</FieldOrderNumber>
<FieldValueOptionList/>

</DocumentLevelField>


<DocumentLevelField>
<Name>Date</Name>
<Value>8/13/11</Value>
<Type>DATE</Type>
<Confidence>0.0</Confidence>
<FieldOrderNumber>2</FieldOrderNumber>
<FieldValueOptionList/>

</DocumentLevelField>


<DocumentLevelField>
<Name>Type</Name>
<Value>Invoice</Value>
<Type>STRING</Type>
<Confidence>0.0</Confidence>
<FieldOrderNumber>3</FieldOrderNumber>
<FieldValueOptionList>Invoice;Statement;Other</FieldValueOptionList>

</DocumentLevelField>


<DocumentLevelField>
<Name>Note</Name>
<Value/>
<Type>STRING</Type>
<Confidence>0.0</Confidence>
<FieldOrderNumber>4</FieldOrderNumber>
<FieldValueOptionList/>

</DocumentLevelField>



</DocumentLevelFields>


<Pages>
<Page>
<Identifier>PG0</Identifier>
<OldFileName>HA08_013-0000.tif</OldFileName>
<NewFileName>BI53_0.tif</NewFileName>
<PageLevelFields>
<PageLevelField>
<Name>Image_Compare_Classification</Name>
<Value>Mail_First_Page</Value>
<Confidence>0.0</Confidence>
<FieldOrderNumber>0</FieldOrderNumber>
<AlternateValues>
<AlternateValue>
<Name>Image_Compare_Classification</Name>
<Value>Mail_First_Page</Value>
<Confidence>0.0</Confidence>
<FieldOrderNumber>0</FieldOrderNumber>

</AlternateValue>


<AlternateValue>
<Name>Image_Compare_Classification</Name>
<Value>Mail_Last_Page</Value>
<Confidence>0.0</Confidence>
<FieldOrderNumber>0</FieldOrderNumber>

</AlternateValue>


<AlternateValue>
<Name>Image_Compare_Classification</Name>
<Value>Mail_Middle_Page</Value>
<Confidence>0.0</Confidence>
<FieldOrderNumber>0</FieldOrderNumber>

</AlternateValue>



</AlternateValues>



</PageLevelField>


<PageLevelField>
<Name>Search_Engine_Classification</Name>
<Value>Mail_First_Page</Value>
<Type/>
<Confidence>19.82</Confidence>
<FieldOrderNumber>0</FieldOrderNumber>
<AlternateValues>
<AlternateValue>
<Name>Search_Engine_Classification</Name>
<Value>Mail_First_Page</Value>
<Type/>
<Confidence>19.82</Confidence>
<FieldOrderNumber>0</FieldOrderNumber>

</AlternateValue>


<AlternateValue>
<Name>Search_Engine_Classification</Name>
<Value>Mail_Last_Page</Value>
<Type/>
<Confidence>5.184</Confidence>
<FieldOrderNumber>0</FieldOrderNumber>

</AlternateValue>


<AlternateValue>
<Name>Search_Engine_Classification</Name>
<Value>Mail_Middle_Page</Value>
<Type/>
<Confidence>4.959</Confidence>
<FieldOrderNumber>0</FieldOrderNumber>

</AlternateValue>



</AlternateValues>



</PageLevelField>



</PageLevelFields>


<HocrFileName>BI53_PG0.html</HocrFileName>
<ThumbnailFileName>BI53_PG0_displayThumb.png</ThumbnailFileName>
<ComparisonThumbnailFileName>BI53_PG0_compareThumb.tif</ComparisonThumbnailFileName>
<DisplayFileName>BI53_PG0.png</DisplayFileName>
<OCRInputFileName>BI53_PG0.png</OCRInputFileName>
<Direction>NORTH</Direction>
<IsRotated>false</IsRotated>

</Page>


<Page>...</Page>



</Pages>


<MultiPagePdfFile>BI53_documentDOC1.pdf</MultiPagePdfFile>

</Document>




DESIRED DATATABLE
BatchInstanceIdentifierIdentifierDocumentLevelFieldDocumentLevelFieldDocumentLevelFieldMultiPagePdfFile
BI53DOC1Von Maur8/13/11InvoiceBI53_documentDOC1.pdf
DOC2Blue Cross7/15/11StatementBI53_documentDOC2.pdf
 

Attachments

  • BI53_batch.xml.txt
    64.8 KB · Views: 41
Hi,

Since you have started with a DataSet to read your information, I have followed along the same lines in this example. The unfortunate thing is that your XML file creates a DataSet with 14 independent tables, of which, you only need to access 3 of them to get the consolidation you are after. The tables you need to use, from a DataSet index point of view, are 0, 3 and 5. Have a look at this example:-

VB.NET:
Public Class Form1
  Dim DS As New DataSet
 
  Private Sub Button1_Click(sender As System.Object, e As System.EventArgs) Handles Button1.Click
    'read the XML string into a Dataset
    DS.ReadXml(Application.StartupPath & "\XMLFile1.xml")
    'consolidate the necessary tables in the DataSet
    ConsolidateDataSet()
    'add the consolidated DataTable to a DataGridView
    dgvConsolidated.DataSource = DS.Tables("ConsolidatedDataTable")
  End Sub
 
  Private Sub ConsolidateDataSet()
    'Create a data table with the required columns
    CreateConsilidatedDataTable()
 
    'Use Table No. 0 to to get the batch identifer
    Dim BatchInstance As String = DS.Tables(0).Rows(0)(0).ToString
 
    'Loop through tabe No. 3 the get the document details and then
    'access the correct Row Index in Table no 3 to get the company 
    'information. Then bind them all together and add to the 
    'consolidated data row to the Data table
    For Counter As Integer = 0 To DS.Tables(3).Rows.Count - 1
      Dim startCompanyIndex As Integer = Counter * 4
      Dim myConsolidatedRow As DataRow = DS.Tables("ConsolidatedDataTable").NewRow
      With myConsolidatedRow
        .Item(0) = BatchInstance
        .Item(1) = DS.Tables(3).Rows(Counter)(0).ToString
        .Item(2) = DS.Tables(5).Rows(startCompanyIndex)(1).ToString
        .Item(3) = DateTime.Parse(DS.Tables(5).Rows(startCompanyIndex + 1)(1).ToString, Globalization.CultureInfo.InvariantCulture)
        .Item(4) = DS.Tables(5).Rows(startCompanyIndex + 2)(1).ToString
        .Item(5) = DS.Tables(3).Rows(Counter)(8).ToString
      End With
 
      'add the consolidated row to the Datatable
      DS.Tables("ConsolidatedDataTable").Rows.Add(myConsolidatedRow)
    Next
  End Sub
 
  Private Sub CreateConsilidatedDataTable()
    Dim TableColumns() As DataColumn = {New DataColumn("BatchInstance", GetType(String)),
                                        New DataColumn("Identifier", GetType(String)),
                                        New DataColumn("CompanyName", GetType(String)),
                                        New DataColumn("Date", GetType(Date)),
                                        New DataColumn("Type", GetType(String)),
                                        New DataColumn("DocName", GetType(String))}
    DS.Tables.Add("ConsolidatedDataTable")
    DS.Tables("ConsolidatedDataTable").Columns.AddRange(TableColumns)
  End Sub
End Class

Hope that helps.

Cheers,

Ian
 
Back
Top