convert to csv

jazzwhistle

Member
Joined
Oct 17, 2006
Messages
7
Programming Experience
1-3
You cannot read a binary file into a string - why would you want to?

Well for example, you might have a binary file that contains some ascii and you may be looking for a certain sequence of bytes in order to process the file. You can't use Contains or Substring with a byte array as far as I know, so I've been having a similar problem.

You have to use ReadAllBytes to read binary files as a byte array (GetString stops, as you say, with 0's), and then you can loop through it to build your string using "&" or a StringBuilder.Append, or, even nicer you can use BitConverter.ToString(yourbytearray).

Which brings me to My question... for small files that is all fine, but for a 125kb file (not huge...) the concatenating takes ages :mad:

StringBuilder.Append works much faster than concantenating with &, but I still need to convert the stringbuilder back to a string at some point in order to use Contains and SubString later on to process the file - at which point the StringBuilder.ToString takes forever!

I'd be very grateful if anyone has any ideas why this takes so long, or how else I should be approaching the problem.

Cheers
 
Well for example, you might have a binary file that contains some ascii and you may be looking for a certain sequence of bytes in order to process the file.

You just answered your own objection..

You said you may be looking for a certain sequence of bytes - i.e. if youre looking for bytes, then look for bytes. Dont bother with all this string malarkey. Read the byte array in, and write a simple algorithm to search for the bytes you require. Converting it into a string just for the sake of using substring/contains is a little heavy on resources and somewhat pointless.

You need to explain fully your problem. All I know right now is that for some reason youre reading a file into a byte array because it's not an ascii file. Then youre going to huge effort to convert it into a string so you can use the built in methods of searching and cutting..

Suppose I was given the task of writing a program that stripped the ID3v1 tag out of an MP3 and copied it to a separate file. I would do this:

Establish a random access file stream to the file.
Seek to the File.Length - 128
Examine the 3 bytes at that offset onwards for their being characters 'T' 'A' 'G' in sequence (hex 54, 41, 47)
If they were, establish another filestream, output this time, encoding ascii text.
Read 128 bytes from the MP3 at the current location
Write those 128 bytes to the ascii stream

I could do the last bits in the same line (read/write)


What it looks like you would do is:

Read the whole MP3 into a byte array (mem usage = 10 meg)
Convert the byte array to a string using stringbuilder etc (mem usage > 20meg)
Call Contains("TAG") and see if the result is 128 from the end
Call substring to clip the last 128 characters off the converted string
Establish an output stream
Write the characters to the stream


If you can give more info about the problem youre trying to solve (rather than the problem with your proposed solution) I'm sure we'll be able to recommend something :)
 
Sorry about being too vague - I am writing my very first .NET app as a learning project, and my goal is to convert a binary file exported by my calendar software that contains events and dates, into csv format.

I should add that although I am certainly approaching this in completely the wrong way, I have learnt a huge amount about VB arrays and methods, strings, file IO etc :) which was my main goal. Thanks for the ID3 example, I'll take that as my next project on my route to VB .Net mastery!

The main reason I am converting the binary file to a string is that I can not find a way to find a certain sequence of bytes - 3 is ok with sequential IF's, but what if it were 30? Is there an elegant way to do this? Another reason for my byte to string madness is that this app began life as an attempt write a simple hex display program for binary files, and I wanted to display in a textbox as "0F" rather than just plain "F".

Also, If we were talking about 5Mb mp3 files then I take your point about the stupidity of my approach... but for a 125Kb file I would have thought that this would be OK... I have now in fact come up with yet another (surely messy) solution that avoids the ToString method:

1. read in the binary file as byte array using ReadAllBytes
2. convert byte array to string using a "bytesasstring = Format(fileContentsasbytearray(i), "X")" loop and Appending to a StringBuilder on each iteration, with a check to place a "0" before any value <x10
3. I write the completed stringbuilder to a txt file with StreamWriter
4. I read the text back in with FileSystem.ReadAllText
5. I can process using string methods to my hearts content.

Putting aside the byte array question, I would like to know the practical size limits for using ToString on a StringBuilder, for example. Why is it faster to write to a file and reopen, rather than using StringBuilder.ToString ? If nothing else, this example has awoken me to the string immutability and concatenation problem :)

Thank you for your patience
 
Last edited:
Thread was splitted - different topics.
 
It looks like this thread was split out into another thread as a result of continued discussion re random access File IO etc

Question is.. where did the other thread go?
 
jazzwhistle;
Well, I'm not entirely sure what problem youre trying to solve, or why you'd be looking to replace.. If you have a file that is a customised binary format and you want to turn it into csv, then you read it in, but you never manipulate the file itself.. Instead you define an object that represents a record in the database. When you make a new object, pass a byte array subsection to the constructor, which then goes about converting that byte array into a represented object. After representation is complete, you can write a .ToString method for the object you created, that returns a csv formatted line

The crux of this is the conversion. As an example we have a system that produces a file that contains data used to emboss credit cards with letterings and write magnetic stripe info. I wrote a program that reads this file line by line, and cuts the line up into variables (its a fixed width file). Now that we have the line represented as an object, we can swap bits in and out and do other things to it, then call ToString and convert the object back into a string that we ultimately write to another file. The principle is similar.
 
Back
Top