Problem width charset converting

fortejava

New member
Joined
Jan 27, 2011
Messages
2
Programming Experience
Beginner
Hi to all members, i am going crazy and i hope for an help... :(
for a new project, i need to manage xml files like the xml file attached to this post.
The problem is on the charset: is there any way to convert unsorted chars in html entities?

for example: Mercoledì -> Mercoledí

Thanks for your attention
Alessandro
 

Attachments

  • example.txt
    2 KB · Views: 23
Neither Xml nor the default Utf-8 encoding has any problems with that char. The problem is most likely your source text is codepage 1252 and you have saved that to xml as utf-8. You have to use a specific encoding either way, or convert the result between the two. Examples to this, if I save what you posted as 1252 encoded xml file I can open it in Notepad and the correct char displays (small latin i with acute), Notepad detect this as legacy 'ansi' encoded document. Similar if I write/copy such a char as utf-8 and write it to xml with utf-8 it will read and display correctly.

Html entities by the way, is only relevant for html and when 'characters that cannot be expressed in the document's character encoding'.
 
unfortunately i don't control the xml producer program... Now I manually convert strings using XML Writer, i tried to convert strings using string.replace without success, have you any specific suggestion?
Alessandro
 
unfortunately i don't control the xml producer program...
It's wrong anyway, so you tell them! Now if you have a string (s) containing the 1252 text encoded as utf-8 you can convert it like this:
VB.NET:
Dim s = "ì"
Dim dec = System.Text.Encoding.GetEncoding(1252)
s = System.Text.Encoding.UTF8.GetString(dec.GetBytes(s))
 
Back
Top