Problem writing character to text file

keb1965 · Feb 1, 2010

I have a need to write a character to a text file. Currently I am writing the file using a streamwriter and I have tried everything I can imagine, but I cannot get a single character "°" to write to the file properly. If I put it as a bit of text it is converted to "Â°" and this causes additional problems when the data has to be re-evaluated with another program.

I tried writing it as a string initially, then I tried writing it as a character, but regardless, each time it turns out the same.

I'd appreciate any help

Thanks

JohnH · Feb 1, 2010

That depends what encoding you are using, and in what context the problem occurs. For example most .Net text file tools uses Utf8 encoding by default, so if I for instance if I copy the char you posted and write it to a text file I can also read it back or view it in Notepad without problems. You have to explain some more and post some relevant code methinks.

keb1965 · Feb 1, 2010

This is a relatively simple file writing function that takes a string() and writes each line out to a file.

The strings, built within code is something similar to this:
"01-28-2010,12:11:38,N 38°51'41.0616",W 94°49'38.9712",0.5mph,253.2°,983ft"

Everything works great, with the exception of the writing portion. It outputs the data as:
"01-28-2010,12:11:38,N 38Â°51'41.0616",W 94Â°49'38.9712",0.5mph,253.2Â°,983ft"

As you can see there are added characters in the string.

VB.NET:

    Public Shared Function WriteToFile(ByVal FileName As String, ByVal lines() As String) As Boolean
        Try
            Dim sw As New System.IO.StreamWriter(FileName)
            Dim X As Integer = 1
            Do While X <= UBound(lines)
                sw.Write(lines(X))
                X += 1
            Loop
            sw.Close()
            Return True
        Catch ex As Exception
            Return False
        End Try
    End Function

I have checked to see if perhaps I was not creating the string properly, and each time I received similar results. I have tried these various permutations in the event there was something amis in the string building:

VB.NET:

MyString = MyString & Chr(176)
MyString = MyString & "°"
MyString = MyString & Chr(Asc("°"))

Finally I resorted to writing the lines as binary and got similar results, but with the added problem that every character is now prepended with hex 01.

I suppose I could run the entire string through a single character byte and write that single byte out to the file, but that is so inefficient.

I have also used every encoding option available and still no dice

Also, when viewing the file with notepad, the error isn't evident, it is only when processing the file contents with another program that I became aware of the issue.

If you view the files in VIM or with a binary viewer, you can see the additional characters.

JohnH · Feb 1, 2010

What do you get if you output "°" as Utf8 and read it as back as Utf7? You'll never guess it was "Â°"

So you must set Utf7 as output encoding:

VB.NET:

Dim w As New StreamWriter("filepath", False, System.Text.Encoding.UTF7)

keb1965 · Feb 1, 2010

The problem with using utf7 is the character is still not written correctly. A sample using utf7 is:
"01-29-2010,20:23:28,N 38+ALA-51'40.8420+ACI-,W 94+ALA-49'42.1356+ACI-,2.4mph,0.0+ALA-,823ft"

Other formats produce similar results, some formats produce a result that appears to be correct, but when read, the additional bytes appear. Notepad is especially egregious when it comes to ignoring characters, VIM is better, but even it will ignore characters if the file type characters are set at the beginning of the file.

JohnH · Feb 1, 2010

Then I have no idea, I just tested "°" conversion from Utf8 to Utf7 and it produces the exact "Â°" problem you referred to.

keb1965 · Feb 1, 2010

I guess I will have to convert everything to a byte and use a binarywriter to write each byte to the file individually .. not the solution I would have hoped for, but it should work ... at least the last time I used a binarywriter to write a byte to a file, it would write it as the byte you sent and not expand it to some unknown format.

JohnH · Feb 1, 2010

There is nothing unknown about the bytes a StreamWriter writes for text, it's all about the encoding used. Whether you convert text to bytes using the specified encoding or you write text with StreamWriter using the same specified encoding produce exactly the same result. The only difference using StreamWriter is that it adds the byte-order-mark at start of file specifying the encoding used, but the byte content of the text if identical.

keb1965 · Feb 1, 2010

you are right, thats why I have to use a binarywriter instead of a streamwriter ... streamwriter is flaky and doesn't interpret characters correctly, regardless of format used ... chr(176) should produce a binary B0 when using at least one of the streamwriter formats, but there isn't one that will do it.

I had to convert each character to a single byte and write that byte out to a binarywriter stream

anyway, i am now getting what I needed using the following code:

VB.NET:

    Public Shared Function WriteToFile(ByVal FileName As String, ByVal lines() As String) As Boolean
        Try
            Dim sw As New System.IO.FileStream(FileName, IO.FileMode.Create)
            Dim bw As New System.IO.BinaryWriter(sw)
            Dim X As Integer = 1
            Dim Y As Integer
            Dim Byt As Byte
            Do While X <= UBound(lines)
                Y = 0
                Do While Y < lines(X).Length
                    Byt = Asc(lines(X).Substring(Y, 1))
                    bw.Write(Byt)
                    Y += 1
                Loop
                Byt = Asc(vbCr)
                bw.Write(Byt)
                Byt = Asc(vbLf)
                bw.Write(Byt)
                X += 1
            Loop
            bw.Close()
            sw.Close()
            Return True
        Catch ex As Exception
            Return False
        End Try
    End Function

Incidently, even a binarywriter does not interpret Chr(10) and Chr(13) correctly, they are written as 2 byte characters ... go figure .. have to expressly convert to a byte to get the proper CrLf as needed.

JohnH · Feb 1, 2010

streamwriter is flaky and doesn't interpret characters correctly

Rubbish. What you fail to realize is that ALL text is represented with an encoding, the problem you face is finding the right encoding to use, and that includes what encoding used when reading/constructing the source text. So what information do you have? In last post you seem to indicate that the text encoding you require must use one byte only each character. That limits the encodings available, mostly to legacy Ansi codepages. Ascii would have fitted if it were not for the char you initially posted is out of its range, it is one of the "extended ascii" chars and thus available in extended charsets. Two options could be the partially overlapping ISO-8859-1/Windows-1252 8bit encodings. The character "°" has value 176 (or "B0" when referring to the hexadecimal string representation of the byte value). Any 8bit encoding supporting this value will give the 176 byte value. If you used Utf32 which is a 32bit fixed encoding you would get bytes (176,0,0,0), Unicode is variable 1-2 16bit values (176,0) for this char, Utf8 is variable 1-4 bytes and uses bytes (194,176) for this char (thus the "Â°" when converted back to a 8bit encoding that supports the range of those code points). So try the 8bit encodings I mentioned or look into other possible Ansi encodings, use Encoding.GetEncoding where you either specify the string name or the integer code page to get a compatible Encoding object. Either way I recommend you do some basic research in general about text encoding to expand the current understanding of the topic.

keb1965 · Feb 1, 2010

I am not interested in a pissing match, I am merely trying to find the simplest way to resolve the situation.

In notepad, I can create ° by holding [ALT] and entering 0176 .. and when I view it as a binary file, it is simply B0h, there aren't any additional binary values prepending or appending it. That is what I need ... the current ANSI codepage (i.e. Encoding.Default)

Regardless of what encoding format I use when using StreamWriter, the result is always is unacceptible.

I have tried:
ASCII = 3Fh
BigEndianUnicode = 00h B0h
Unicode = 00h B0h
UTF32 = 00h 00h 00h B0h
UTF7 = 2Bh 41h 4Ch 41h 2Dh
UTF8 = C2h B0h

Now the only one left is Default, but that uses the OS current ANSI codepage, so if I deploy this application across different languages or perhaps different OS's the result won't be reliable .. or am I wrong to think that?

JohnH · Feb 1, 2010

keb1965 said:
there aren't any additional binary values prepending or appending it. That is what I need

I have tried:
ASCII = 3Fh
BigEndianUnicode = 00h B0h
Unicode = 00h B0h
UTF32 = 00h 00h 00h B0h
UTF7 = 2Bh 41h 4Ch 41h 2Dh
UTF8 = C2h B0h

You are mixing the concepts of text encoding and BOM/preamble here. The former is how the text is represented as bytes (such as the various sample byte values I posted representing the ° char, the same you posted as hex strings now), the latter is the 2-4 bytes signature written to start of file to "explain" the encoding used to any understanding reader (while some readers can't handle BOMs). See for example Byte order mark - Wikipedia, the free encyclopedia To avoid BOM with a custom encoding you must avoid StreamWriter. Just convert the text using the appropriate encoding and write the bytes to file yourself. As said, encoding you cannot avoid, all text is encoded and has a byte representation, it's only a matter of chosing the correct encoding as per requirements.

keb1965 said:
Now the only one left is Default,

You can use ANY encoding known that is supported by the current OS, I've already explained how in previous post. Those preset Encoding objects are there just for simplification because they are most used in .Net programming.

So, this sample code lay it out one thing at a time; the string, the encoding, the byte conversion, the file write.

VB.NET:

Dim s As String = "999EW°"
Dim en As System.Text.Encoding = System.Text.Encoding.GetEncoding(1258)
Dim b() As Byte = en.GetBytes(s)
IO.File.WriteAllBytes("filepath", b)

or as I tend to do, skipping code lines that serve no purpose:

VB.NET:

IO.File.WriteAllBytes("filepath", System.Text.Encoding.GetEncoding(1252).GetBytes("999EW°"))

keb1965 said:
I am not interested in a pissing match, I am merely trying to find the simplest way to resolve the situation.

I had none of the like in mind, and have no idea how you could perceive anything in this thread as such. I have only tried to explain basic text encoding concepts. Without knowledge and understanding how can problems be solved? I do see your outbursts of frustration only as that, and know you only want to get to the finish line, but I also cannot give complete answers until the problem is properly explained. Sometimes only general advice can be given that you have to apply to specific problem solving yourself. Sometimes general advice is given until more specific help is requested.

Btw, the hex "3F" you got with Ascii is, if you care to know, the byte value 63 (? char), that is substituted according to documentation for character values outside the Ascii encoding range.

keb1965 · Feb 2, 2010

please forgive my frustrations .. sometimes intent is missed in a typed medium.

You are correct, the idea in a nutshell is A-->B ... and unfortunately I seem to have reached a point where that doesn't seem possible with StreamWriter, at least not for the purposes I have at hand.

The file will be viewed by humans only rarely, and then only to selectively pick data, so the rendering of it in whatever text reader the person chooses is of little importance, and is in fact of no consequence. The way I see it and understand the way the files are interpreted is that the BOM/Preamble tells notepad/vim/etc... how to render the text.

The fact that the BOM/Preamble is there is unacceptible because the output will be subsequently read by another program (which I have no control over) and extraneous bytes used to depict a specific ANSI codepage or encoding will cause the destination program to crash and burn ... so, obviously I can't use StreamWriter, because in the various encoding formats that don't use a BOM/preamble, the output for char B0h is not B0h, but either a replacement char 3Fh or a multi-byte char, either one of which will cause the destination program to crash.

Given those requirements, I think I am relegated to using a BinaryWriter and writing each byte individually ... unless you see another alternative ...

I was hoping to use a more efficient method because I have to process thousands of files, reading the file data into a buffer, reorganize it into the required format, and write it out to a plain text file (without all the other stuff I have been trying to avoid). Each of these files will have as many as 86,400 records, so we are talking about tens of millions of records that must be processed.

Anyway, the binarywriter provides the correct output, it is just slower than I would like it to be.

Thanks for the lesson on text encoding!

JohnH · Feb 2, 2010

Given those requirements, I think I am relegated to using a BinaryWriter and writing each byte individually ... unless you see another alternative ...

Anyway, the binarywriter provides the correct output, it is just slower than I would like it to be.

File.WriteAllBytes/FileStream/BinaryWriter is the same when you only write bytes to file. Given the size collecting everything in memory, decoding once, and WriteAllBytes would perhaps be less attractive memorywise than writing out parts, while it would be faster. Consider some tens of megabytes of data in memory temporarily during processing is something any modern computer with a GB of ram will easily handle. Writing single bytes for each write call would be rather stupid and time consuming, at the least you would be outputting byte arrays of some size each IO operation. Also, if you're building up large strings with string concats that is incredibly slow compared to for example using a StringBuilder. Two text encoding calls would presumably be slower than one, even if that one encodes twice the amount of text. You can write various code paths and see how they compare in processing time using the Stopwatch class, then make a decision on what balance to make regarding memory use, string building, encoding and IO operations.

keb1965 · Feb 2, 2010

I'll work on speed improvements and see what I can do to gain some ground.

I initially thought about utilizing concurrent operations, but considering the IO operations will be the slow link, doing both at the same time would likely create additional speed issues.

You are right, the files are no larger than 2mb, so memory isn't a huge issue.

The general flow of the program is as follows:
while not EOF read lines from oldfile into buffer()
for each line in buffer()
split line into data()
run calculations on data()
reorganize data() into output()
write output() to newfile
process next file

There isn't really alot of room for speed enhancements except on IO operations and I have already had to issue a break in the loop to process application events, otherwise the program appears to hang, even though it hasn't.

Anyway, as I said before, I'll gladly entertain any speed enhancements you can think of.

Problem writing character to text file

keb1965

Well-known member

JohnH

VB.NET Forum Moderator

keb1965

Well-known member

JohnH

VB.NET Forum Moderator

keb1965

Well-known member

JohnH

VB.NET Forum Moderator

keb1965

Well-known member

JohnH

VB.NET Forum Moderator

keb1965

Well-known member

JohnH

VB.NET Forum Moderator

keb1965

Well-known member

JohnH

VB.NET Forum Moderator

keb1965

Well-known member

JohnH

VB.NET Forum Moderator

keb1965

Well-known member

Similar threads

Share this page

Latest posts