Question Reading Files

PeteN

Member
Joined
Feb 25, 2008
Messages
6
Programming Experience
Beginner
In our business we have to read a lot of .csv files and import them into our system. The .csv files are not always laid out the same as these are generated by a third party.

Q1. when reading the file I am opening a stream reader and reading it line by line. I,m sure there is a faster way to do this, with classes. if so, can any one point me in the right direction to look at this
Q2. Currently when reading each line of the .csv file, it is read into array. New changes by a third party as added a comma as part of the data, so in the example record below, stripping the quotes befor loading it in the array will not work. Ids there a simple routine to strip the quotes from each element in the array?

"123456","12345678","MY NAME IS FRED", "157," ,"784452.00"
 
Hi,

Your answer is to use the TextFieldParser class. Here is a quick example:-

VB.NET:
Using myReader As New TextFieldParser(Application.StartupPath & "\ExampleData.txt")
  myReader.TextFieldType = FieldType.Delimited
  myReader.SetDelimiters(",")
  myReader.HasFieldsEnclosedInQuotes = True
 
  Do While Not myReader.EndOfData
    Dim myData() As String = myReader.ReadFields
    MsgBox(String.Join(vbCrLf, myData))
  Loop
End Using

In the above example, I use String.Join to reconstitute the myData array just for display purposes but you would use the myData array in any way that you need.

For more information, have a look here:-

TextFieldParser Class (Microsoft.VisualBasic.FileIO)

Hope that helps.

Cheers,

Ian
 
The easiest way is through an OleDbConnection, it lets you query the file the same as any other table with SQL syntax:

    Function ReadCSV(ByVal CSVFilename As String) As DataTable
        ReadCSV = New DataTable("Results")

        Dim CSVFileInfo As New System.IO.FileInfo(CSVFilename)

        Dim ConnectionString As String = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=""" & CSVFileInfo.DirectoryName & """;Extended Properties='text;HDR=Yes;FMT=Delimited(,)';"
        Dim CommandString As String = "SELECT * FROM [" & CSVFileInfo.Name & "];"

        Using Conn As New System.Data.OleDb.OleDbConnection(ConnectionString)
            Using Cmd As New System.Data.OleDb.OleDbCommand(CommandString, Conn)
                Using Adapter As New System.Data.OleDb.OleDbDataAdapter(Cmd)
                    Adapter.Fill(ReadCSV)
                End Using
            End Using
        End Using
    End Function
 
Last edited:
Hi Herman,

Hope you are well. That's an interesting example, I have never read a text file like that before.

However, a couple of quick hints and tips for you (Sorry:tongue:):-

1) You have forgotten to associate your "cmd" variable with a valid Connection variable so this example currently fails. You need to expand your "cmd" variable to use a constructor that can take your Conn variable. You can then get rid of the Conn.Open statement since this is irrelevant in the context you have posted.

2) You have also forgotten to Return the resulting table from your Function to the Calling routine.

Please do not shout at me for pointing out these minor points since I admire anyone like you for helping out on this forum. However, as I have learnt to my own detriment, if your name is not jcmilhinney, then you need to test your code before positing.

Have a good day.

Cheers,

Ian

:encouragement:
 
1) You are right, example corrected.
2) I have not forgotten to return anything, if you notice the Adapter.Fill call points to the ReadCSV variable, which in fact is the implied return variable. I could have declared a local variable, and then Return'ed that, but that is just one more variable that is not needed. To return a value from a Function you can simply assign the return value to the implied variable with the same name as the Function:
Function Bleh() As Integer
    Bleh = 1 + 1
End Function


I should probably add that most of the code I post here is written in notepad, simply because I am doing something else and don't bother opening up visual studio for a snippet. These snippets are not meant to be copy/pasted into a working solution anyways, it's probably good that there are some mistakes for OP's of the world to find and correct themselves... ;)
 
Last edited:
1) You are right, example corrected.
2) I have not forgotten to return anything, if you notice the Adapter.Fill call points to the ReadCSV variable, which in fact is the implied return variable. I could have declared a local variable, and then Return'ed that, but that is just one more variable that is not needed. To return a value from a Function you can simply assign the return value to the implied variable with the same name as the Function:
Function Bleh() As Integer
    Bleh = 1 + 1
End Function


I should probably add that most of the code I post here is written in notepad, simply because I am doing something else and don't bother opening up visual studio for a snippet. These snippets are not meant to be copy/pasted into a working solution anyways, it's probably good that there are some mistakes for OP's of the world to find and correct themselves... ;)
That's considered poor practice in VB.NET. You should always use an explicit Return statement in VB.NET. The use of implicit variables in functions is supported so that upgraded VB6 code that does that still works but new VB.NET code should eschew the practice.
 
I haven't really seen any reference to it being a poor practice, but the argument I see used in favor of return is of readability. However I really do not see any downside to either methods. As it's been said before, it's all really just about being consistent.

Function IsDivisibleBy10(SomeInteger As Integer) As Boolean
    Return If(SomeInteger Mod 10 = 0, True, False)
End Function

Function IsDivisibleBy10(SomeInteger As Integer) As Boolean
    IsDivisibleBy10 = If(SomeInteger Mod 10 = 0, True, False)
End Function


Personally I prefer the second form, especially in functions that have a bunch of escape routes through Exit Function. Furthermore, Return is different than using the implicit variable. Assigning to the implicit variable only does just that, assign some value to a variable. The Return is implied through the End Function. Suppose you need to figure out a rough return value through a conditional block but then format it before returning. Now you need to declare an extra variable for the conditional block, and add a Return statement after formatting. In some cases you will also want to set a default return value through the header to avoid not returning a value if there is an exception. Incidentally, the reason why the Return construct is now preferred is because it allows you to directly return a value without using an extra variable. What's the point of all the extra code, when you could have done both on the implicit variable, especially considering the optimized IL will be identical? Why should I change my ways and add code that bloats readability and gets optimized away anyways? ;)
 
Last edited:
Back
Top