Question Extra Spaces in Element Values

Tom

Well-known member
Joined
Aug 23, 2005
Messages
746
Programming Experience
5-10
I receive xml files from customers to be imported into our database and having some problems with extra spaces appearing between words in some of the field values.

When viewing the xml file in a browser, the values appear formatted properly with a single space between each of the words. But when I read the file into a dataset or even open the file in notepad, wordpad or word, the same text fields would then have a varying amount of spaces between some of the words.

View as XML in browser

VB.NET:
<myDataSet>
    <myTable>
        <myColumn>[COLOR="Red"]My Item Text Value Here[/COLOR]</myColumn>
    </myTable>
  <myTable>
    <myColumn>[COLOR="red"]My Item Text Value Here[/COLOR]</myColumn>
  </myTable>
</myDataSet>

View XML in text editor and as appears in dataset field values

VB.NET:
<myDataSet>
    <myTable>
        <myColumn>[COLOR="red"]My                     Item Text           Value Here[/COLOR]</myColumn>
    </myTable>
  <myTable>
    <myColumn>[COLOR="red"]My Item           Text Value            Here[/COLOR]</myColumn>
  </myTable>
</myDataSet>

Searching the issue I see plenty of links regarding whitespace but this seems more oriented toward the formatting of the file structure itself rather then extra spaces within an element value.

Why is this occuring and is there an easy way to correct it without looping through every record in the file?
 
You're still looping through the records with this solution. I haven't done it before but you may be able to use normalize-spaces() in an xsl transform.

VB.NET:
		Dim xmlDoc As New XmlDocument
		xmlDoc.Load("C:\Temp\irregularSpaces.xml")

		For Each node As XmlNode In xmlDoc.SelectNodes("/myDataSet/myTable/myColumn")
			node.InnerText = Regex.Replace(node.InnerText, "\s{2,}", " ")
		Next

		xmlDoc.Save("C:\Temp\regularSpaces.xml")
 
Ya Im doing something similar to that now but its time consuming on larger files. Taking about 60 seconds to load file, format these fields and import about 40,000 records to the database. I could cut that time by a third if I could get away from that loop.
 
Back
Top