testing for ASCII file

liptonIcedTea

Well-known member
Joined
Jan 18, 2007
Messages
89
Programming Experience
1-3
Hi,

At the moment I'm trying to design a feature to upload a CSV file.

So far, I'm reading it, and then turning it into a byte array and then converting it into ASCII code.

I'm wondering is there a way to test if the file is ASCII format? Even better, is there a "simple" way to test if the CSV file is in the correct format?
 
All files are just a collection of bytes. Those same bytes could be interpreted in many different ways. Every file is in ASCII format because you can simply read the bytes as ASCII values. If applications require files in specific formats, like a Word DOC or an Excel XLS, then the application author would normally define a header that the app would read first to check whether it's a file of the correct type, but the only way to ensure that there's no corruption in the rest of the file is to read it. The CSV format has no header information so that's out. If you need a specific number of fields per line or whatever then all you can do is read the file and check whether there are that many fields in each line.
 
A messy way would be to read every 10 bytes and average the ASCII codes. If the average stays within a certain range, you can be fairly sure it's "ASCII" as far as the concept of readable text is concerned.

To be sure it is in proper CSV format, you can get the number of commas in the first line (the column headers) and make sure the rest of the lines have at least the same number of commas. For this, you can do StreamReader.ReadLine, then do a String.Split(",").Count. Should be very simple to implement, and it won't slow things down too much.
 
To be sure it is in proper CSV format, you can get the number of commas in the first line (the column headers) and make sure the rest of the lines have at least the same number of commas. For this, you can do StreamReader.ReadLine, then do a String.Split(",").Count. Should be very simple to implement, and it won't slow things down too much.


i actually found this was not always the case. i've been trying to parse outlook email address into CSV, and if you have outlook, you can try this. But the header has 92 columns, whilst each column may have from 85-92...

weird...

i actually found a really good library that does this.
http://www.codeproject.com/cs/database/CsvReader.asp
 
Back
Top