Question Convert from UTF8 to ANSI?

littlebigman

Well-known member
Joined
Jan 5, 2010
Messages
75
Programming Experience
Beginner
Hello

I've been googling and trying different ways for an hour, but still can't find how to convert an UTF-8 string into ANSI that System.IO.StreamWriter() will accept.

Using the WebClient object, I'm downloading a web page encoded in UTF-8, and need to convert it to ANSI (I assume StreamWriter() expects ANSI) so that I can save it into a file using System.IO.StreamWriter().

The following turns accented characters from Unicode to ??:
VB.NET:
Sub AlertStringDownloaded(ByVal sender As Object, ByVal e As DownloadStringCompletedEventArgs)
    If e.Cancelled = False AndAlso e.Error Is Nothing Then
        Dim title As Regex = New Regex("<title>(.+?)</title>")
        Dim m As Match

        m = title.Match(CStr(e.Result))
        If m.Success Then
            Dim resbytes() As Byte = Encoding.UTF8.GetBytes(m.Groups(1).Value)
            Dim Response As String = Encoding.Default.GetString(Encoding.Convert(Encoding.UTF8, Encoding.ASCII, resbytes))

            'Illegal characters in path : ??
            Dim objWriter As New System.IO.StreamWriter("c:\" & Response & ".txt")
            objWriter.Write(CStr(e.Result))
            objWriter.Close()

        End If
End Sub

If someone has some working code handy, I'm interested.

Thank you.
 
You seem to be under some misconceptions. Why exactly do you think that you need to change the encoding in the first place? The only reason would be some other app needs to read the resulting file and it requires a specific encoding. If that is not the case for you then just forget about encoding altogether. Simply create a StreamWriter and call Write. It will write the text to the file using UTF8 encoding. You would then read the file using UTF8 encoding too, which a StreamReader will do by default.
 
Thanks for the help.

I need to convert because otherwise, VB.Net isn't happy when creating the file that contains a UTF-8 character (ie. two bytes for an accented character):
VB.NET:
Private Sub AlertStringDownloaded(ByVal sender As Object, ByVal e As DownloadStringCompletedEventArgs)
    If e.Cancelled = False AndAlso e.Error Is Nothing Then
        Dim title As Regex = New Regex("<title>(.+?) \(", RegexOptions.Singleline)
        Dim m As Match
        m = title.Match(CStr(e.Result))
        If m.Success Then
            Dim MyTitle As String = m.Groups(1).Value

            [b]'Illegal characters in path.[/b]
            Dim objWriter As New System.IO.StreamWriter("c:\" & MyTitle & ".txt")

            objWriter.Write(CStr(e.Result))
            objWriter.Close()
        End If
    End If
End Sub
 
Problem identified: There was a hidden LF character in the title, which is what was causing the error when creating the file.

Next, I need to find a way to convert a UTF8 filename into its user-friendly ANSI alternative.
 
This doesn't work:

Input page:
VB.NET:
<title>Cinéma Paradiso</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

Code:
VB.NET:
'BAD EMPTY Dim Response As String = Encoding.ASCII.GetString(Encoding.Unicode.GetBytes(e.Result))
Dim Response As String = Encoding.ASCII.GetString(Encoding.UTF8.GetBytes(e.Result))

Output:
VB.NET:
<title>Cin????ma Paradiso</title>
 
You just need to set WebClient.Encoding property to Encoding.UTF8.
Don't mix Ascii encoding into this.
If there can be chars that is illegal in file names in that string you can loop through Path.GetInvalidFileNameChars and replace them.
 
Back
Top