Reading a text file and XML file...

judge

Member
Joined
Dec 7, 2009
Messages
18
Programming Experience
Beginner
I have a text file called template.txt and an XML file called datafile.xml.

Example content of template.txt
"There was a man called $$$placeholder1$$$"

Example content of datafile.xml
<placeholder1>Joe Bloggs<placeholder1 />

Basically what i need to achieve is a console application which when executed, replaces the placeholders in the template.txt file with the relating name from the XML file.

Any help would be greatly appreciated :)

(so far i can search the template.txt file for the text value "$$$placeholder1$$$", along with its character position within the file.)
 
Here is one way to do it, Regex.Replace with a MatchEvaluator function that do the xml lookup:
VB.NET:
'Imports System.Text.RegularExpressions

Dim doc As New Xml.XmlDocument
doc.Load("datafile.xml")
Dim evalFunc = Function(m As Match) (doc.SelectSingleNode("//" & m.Groups("name").Value).InnerText)
Dim newtext As String = Regex.Replace(IO.File.ReadAllText("template.txt"), "\${3}(?<name>.+?)\${3}", evalFunc)
While the match.Index is known it is not that useful in this case, Regex replaces the matches string at position. For this reason this is also a lot faster than looping the matches and replace each in a StringBuilder.
 
Excellent help thankyou. The placeholders are now replaced with the relevant names from the XMl file. I know this as i added [ MsgBox(newtext) ] after the code you gave me, which displays the contents assigned to the variable "newtext".

Now all i have to do is figure out how to copy this variable to the ouput file. Once again thanks for you help :)
 
Thankyou very much, it is working fully now. My entire code reads like this:


Imports System.IO
Imports System.Text.RegularExpressions

Module Module1

Sub Main()

Dim doc As New Xml.XmlDocument
doc.Load("c:\data_file.xml")
Dim evalFunc = Function(m As Match) (doc.SelectSingleNode("//" & m.Groups("name").Value).InnerText)
Dim newtext As String = Regex.Replace(IO.File.ReadAllText("c:\template.txt"), "\${3}(?<name>.+?)\${3}", evalFunc)
MsgBox(newtext) 'Just to let user know what is being written to the output file

Dim strData As String
strData = newtext
Dim FullPath As String
FullPath = ("c:\output.txt")
Dim ErrInfo As String = ("errinfo")

Dim bAns As Boolean = False
Dim objReader As StreamWriter
Try

objReader = New StreamWriter(FullPath)
objReader.Write(strData)
objReader.Close()
bAns = True
Catch Ex As Exception
ErrInfo = Ex.Message

End Try
End Sub

End Module
 
judge said:
MsgBox(newtext) 'Just to let user know what is being written to the output file
In a Console application you use one of the Write methods to write output to user through the console:
VB.NET:
Console.WriteLine(newtext)
judge said:
Dim strData As String
strData = newtext
Dim FullPath As String
FullPath = ("c:\output.txt")
Dim ErrInfo As String = ("errinfo")

Dim bAns As Boolean = False
Dim objReader As StreamWriter
Try

objReader = New StreamWriter(FullPath)
objReader.Write(strData)
objReader.Close()
bAns = True
Catch Ex As Exception
ErrInfo = Ex.Message

End Try
If you had read through the code I posted you would have seen the IO.File.ReadAllText call and thought "there must be a WriteAllText also", and there is:
VB.NET:
IO.File.WriteAllText("c:\output.txt", newtext)
 
Thanks. I wrote to the output file in a slightly different way.

strData = newtext
..
objReader.Write(strData)



One of the requirements of the program is that I need to identify any placeholders which are missing in the data_file.xml file, and the position (line/character) in the template.txt file where the placeholder cannot be filled.
I am slightly concerned that i wont be able to achieve this using the route ive decided to go down...would i be right in thinking that i need to use the XML serialisation method instead?

Thanks,

Judge
 
One of the requirements of the program is that I need to identify any placeholders which are missing in the data_file.xml file, and the position (line/character) in the template.txt file where the placeholder cannot be filled.
I am slightly concerned that i wont be able to achieve this using the route ive decided to go down...would i be right in thinking that i need to use the XML serialisation method instead?
"XML serialisation method"? You mean you're absolutely clueless as how to proceed? ;) If the placeholder doesn't exist in data only a little rewrite is required. I would do the same Regex.Replace then a Regex.Matches loop to log the errors (the remaining non-replaced placeholders). The MatchEvaluator delegate need to expand some, since some nodes may not exist. SelectSingleNode returns Nothing if the xpath query doesn't find data, so you have to get the node reference first, check it's validity and get the InnerText. If there is no node the match.Value should be returned instead (ie no replace really). The function is thus defined as described:
VB.NET:
Private Function EvalFunc(ByVal m As Match) As String
    Dim node As Xml.XmlNode = doc.SelectSingleNode("//" & m.Groups("name").Value)
    If node Is Nothing Then
        Return m.Value
    Else
        Return node.InnerText
    End If
End Function
Last parameter for Regex.Replace now has to be AddressOf EvalFunc to point to that function.

As you can see this function uses the XmlDocument (doc variable), for it to be available it has to be declared at module level (or else you have to load the document for each match locally in EvalFunc method).
VB.NET:
Private doc As Xml.XmlDocument
The remaining error reporting loop is simple given the prior samples:
VB.NET:
For Each m As Match In Regex.Matches(newtext, "\${3}(?<name>.+?)\${3}")
    'missing data for placeholder m.Groups("name").Value at index m.Index
Next
You could even process line by line to be able to report the line number, it just adds add For-Next loop:
VB.NET:
Dim lines() As String = newtext.Split(New String() {vbNewLine}, StringSplitOptions.None)
For i As Integer = 0 To lines.Length - 1
    Dim line As String = lines(i)
    For Each m As Match In Regex.Matches(line, "\${3}(?<name>.+?)\${3}")
        'missing data for placeholder m.Groups("name").Value at index m.Index of line (i+1)
    Next
Next
 
Thankyou once again for your reply. No i didnt have a clue as I'm new to this lol. And read so many sites/ebooks now that my head hurts. I'll try your suggestions.

Thanks,
Judge
 
Thanks John. I have used your code sample from another thread in order to accept the paths of the files i wish to open and write to. This works great. Have added the function to my code... made the doc as xml.xml document a private declaration, and also pointed to the function in the regex.replace parameters.

Here is my code now:

Imports System.IO
Imports System.Text.RegularExpressions

Module Module1
Private doc As Xml.XmlDocument

Private Function evalFunction(ByVal m As Match) As String


Dim node As Xml.XmlNode = doc.SelectSingleNode("//" & m.Groups("name").Value)
If node Is Nothing Then
Return m.Value
Else
Return node.InnerText
End If

End Function


Sub Main()

Dim doc As New Xml.XmlDocument

Dim objReader As StreamWriter

'Enter path of data XML file as command line parameter
Console.WriteLine("Enter the path of the XML data file > ")
doc.Load(Console.ReadLine)

'Enter path of the template text file
Console.WriteLine("Enter the path of the template text file > ")
Dim template_path As String = Console.ReadLine()

'Enter path of output file user wants to create
Console.WriteLine("Enter the path of the output file you wish to create > ")
Dim output_path As String = Console.ReadLine()

Dim evalFunc = Function(m As Match) _
(doc.SelectSingleNode("//" & m.Groups("name").Value).InnerText)


Dim newtext As String = Regex.Replace _
(IO.File.ReadAllText(template_path), "\${3}(?<name>.+?)\${3}", AddressOf evalFunction)


MsgBox("Copied to the path " & output_path & ":" & vbNewLine & vbNewLine & newtext)

objReader = New StreamWriter(output_path)
objReader.Write(newtext)
objReader.Close()

End Sub

End Module


This is the error i receive:
"Object reference not set to an instance of an object."
For line:
Dim node As Xml.XmlNode = doc.SelectSingleNode("//" & m.Groups("name").Value)
 
This code:
VB.NET:
Private doc As Xml.XmlDocument
is a replacement for the local declaration:
VB.NET:
Dim doc As New Xml.XmlDocument
in addition you of course have to assign a new instance to the module variable:
VB.NET:
doc = New Xml.XmlDocument
This code:
VB.NET:
Private Function EvalFunc(ByVal m As Match) As String
is a replacement for the local lamda function:
VB.NET:
Dim evalFunc = Function...
 
Thanks John, before i posted my code i took out the lamda function as I realised it was being replaced by the new function. I also took out the "Dim doc As New Xml.XmlDocument" code as i realised this was replaced by the new module level declaration. For some reason I posted the wrong version of my code (think i undo'd the code one too many times). So i guess i was just missing the doc = New Xml.XmlDocument piece of code.

Thanks for your help,
Judge
 
OK. As it stands:

Imports System.IO
Imports System.Text.RegularExpressions

Module Module1
Private doc As Xml.XmlDocument

'Look through each node in the XML file
Private Function evalFunction(ByVal m As Match) As String
Dim node As Xml.XmlNode = doc.SelectSingleNode("//" & m.Groups("name").Value)
If node Is Nothing Then
Return m.Value
Else
Return node.InnerText
End If
End Function

Sub Main()

Console.Title = "Text Replacement Tool"
Console.SetWindowSize(120, 60)

Try
doc = New Xml.XmlDocument
Console.WriteLine(vbNewLine & "Enter the path of the XML data file > ")
doc.Load("c:\data_file.xml") 'doc.Load(Console.ReadLine)
Console.WriteLine(vbNewLine & "Enter the path of the template text file > ")

Dim template_path As String = "c:\template.txt" 'Dim template_path As String = Console.ReadLine()

If File.Exists(template_path) = False Then
MsgBox("Could not find " & template_path & " ,check that the file exists ")
Main()
End If

Console.WriteLine(vbNewLine & "Enter the path of the output file you wish to create > ")

Dim output_path As String = "c:\output.txt" 'Dim output_path As String = Console.ReadLine()

Dim temporarytext As String = Regex.Replace _
(IO.File.ReadAllText(template_path), "\${3}(?<name>.+?)\${3}", AddressOf evalFunction)

Dim objReader As New StreamWriter(output_path)
objReader.Write(temporarytext)
objReader.Close()

'Count how many lines are in data assigned to temporarytext
Dim lines() As String = temporarytext.Split(New String() {vbNewLine}, StringSplitOptions.None)

For i As Integer = 0 To lines.Length - 1
'The current line text is assigned to the string variable line
Dim line As String = lines(i)
For Each m As Match In Regex.Matches(line, "\${3}placeholder[0-9]\${3}")
Console.WriteLine(vbNewLine & "Place holder missing in " & template_path & ": " & vbNewLine & "Name: " & m.Value _
& vbNewLine & "Line: " & i + 1 _
& vbNewLine & "Character position: " & m.Index & vbNewLine)
Console.ReadKey()
Next
Next

Console.WriteLine("Thankyou for using the Text Replacement Tool. Press any key to quit...")
Console.ReadKey()

Catch ex As Exception
MsgBox(ex.Message)
Main()
End Try

End Sub
End Module


Program runs great, no problems there. Only problem is that if my Data_file.xml has more than 1 group of data, the program catches an error "There are multiple root elements. Line 10, position2.". This refers to the position in my XML file where the second groups starts.

Any suggestions as to how i can fix this? I need to be able to read from different groups in the XML file, so that the program will create a new output file for each different groups data.

Thanks,
Judge
 
Multiple root elements is not allowed in the Xml specification. You must have a single root in the Xml document, the root element can have multiple childs etc.
 
Back
Top