PeterM
Member
Hi all,
I am fairly new to VB.net, I used to do a bit of coding in early versions of VB years ago but moved on to things like RPG400, CL and SQL on an AS/400 box. As I am no longer in this career, I havent touched an AS/400 for a while either.
I am currently working on a visual basic console application in Visual Studio 2005 which takes various csv files from an osCommerce web shop, processes them and writes the data to TAS Books accounts software with the aid of the Infoplex COM module.
I have 2 functions, one reads each line of a csv file into an array, and the second function uses Regex to split the lines array into the various fields before they are added to a structure.
There are something like 5000 records (lines) in the csv file, one example is as follows;
"T-GPMM1070","699","Asst. Heat Shrink Tube 3"(76mm)Pk12","","3.28","17.5000","0.57","3.85","0","Kits & Spares","Chargers & Accessories","Charging Accessories","","","Ripmax","t-gpmm1070_1.jpg","0.00","","","0","1","","5","0","0","2010-10-20 11:41:59","","","","","","","","","","","","","","","","","","","","","","","EOREOR"
This originates from a PHP script which builds the csv file with fixed columns, but these columns are not fixed width. I also added the "EOREOR" on the end of the line to ensure that I know when the end of line has been reached.
The way I read the csv file into VB.net is by using 2 functions, the first, gets the csv file and reads the whole file line by line into an array, the funciton is as follows;
The second function uses Regex to process the array lines one by one from within a for loop, splits them into the various fields, which are then individually written to elements of a structure for processing, the code is as follows;
My problem is that due to the formatting of the product description in the example csv line above, it fails the Regex match and therefore fails to read.
I know this is because my pattern is wrong, but I'm not sure how to solve it, I think all I really need is for Regex to split the line by the "," between each element, this might be a little difficult because some of the descriptions contain both , and " but never one after the other.
Any help or advice greatly appreciated.
Peter.
I am fairly new to VB.net, I used to do a bit of coding in early versions of VB years ago but moved on to things like RPG400, CL and SQL on an AS/400 box. As I am no longer in this career, I havent touched an AS/400 for a while either.
I am currently working on a visual basic console application in Visual Studio 2005 which takes various csv files from an osCommerce web shop, processes them and writes the data to TAS Books accounts software with the aid of the Infoplex COM module.
I have 2 functions, one reads each line of a csv file into an array, and the second function uses Regex to split the lines array into the various fields before they are added to a structure.
There are something like 5000 records (lines) in the csv file, one example is as follows;
"T-GPMM1070","699","Asst. Heat Shrink Tube 3"(76mm)Pk12","","3.28","17.5000","0.57","3.85","0","Kits & Spares","Chargers & Accessories","Charging Accessories","","","Ripmax","t-gpmm1070_1.jpg","0.00","","","0","1","","5","0","0","2010-10-20 11:41:59","","","","","","","","","","","","","","","","","","","","","","","EOREOR"
This originates from a PHP script which builds the csv file with fixed columns, but these columns are not fixed width. I also added the "EOREOR" on the end of the line to ensure that I know when the end of line has been reached.
The way I read the csv file into VB.net is by using 2 functions, the first, gets the csv file and reads the whole file line by line into an array, the funciton is as follows;
VB.NET:
' Read a csv file into an array, each array element contains a line from
' the .csv file
Public Function FileToArray(ByVal filePath As String) As String()
Dim sr As System.IO.StreamReader
Try
sr = New System.IO.StreamReader(filePath)
Return System.Text.RegularExpressions.Regex.Split(sr.ReadToEnd, "\r\n")
Finally
If Not sr Is Nothing Then sr.Close()
End Try
End Function
The second function uses Regex to process the array lines one by one from within a for loop, splits them into the various fields, which are then individually written to elements of a structure for processing, the code is as follows;
VB.NET:
Public Function DecodeCSV(ByVal strLine As String) As String()
Dim strPattern As String
Dim objMatch As Match
' build a pattern
strPattern = "^" ' anchor to start of the string
strPattern += "(?:""(?<value>(?:""""|[^""\f\r])*)""|(?<value>[^,\f\r""]*))"
strPattern += "(?:,(?:[ \t]*""(?<value>(?:""""|[^""\f\r])*)""|(?<value>[^,\f\r""]*)))*"
strPattern += "$" ' anchor to the end of the string
' get the match
objMatch = Regex.Match(strLine, strPattern)
' if RegEx match was ok
If objMatch.Success Then
Dim objGroup As Group = objMatch.Groups("value")
Dim intCount As Integer = objGroup.Captures.Count
Dim arrOutput(intCount - 1) As String
' transfer data to array
For i As Integer = 0 To intCount - 1
Dim objCapture As Capture = objGroup.Captures.Item(i)
arrOutput(i) = objCapture.Value
' replace double-escaped quotes
arrOutput(i) = arrOutput(i).Replace("""""", """")
Next
' return the array
Return arrOutput
Else
'Throw New ApplicationException("Bad CSV line: " & strLine)
'Throw New ApplicationException("Bad CSV line: " & strLine)
Console.WriteLine("Bad CSV line: " & strLine)
objWriter.Write("Bad CSV line: " & strLine & vbCrLf)
End If
End Function
My problem is that due to the formatting of the product description in the example csv line above, it fails the Regex match and therefore fails to read.
I know this is because my pattern is wrong, but I'm not sure how to solve it, I think all I really need is for Regex to split the line by the "," between each element, this might be a little difficult because some of the descriptions contain both , and " but never one after the other.
Any help or advice greatly appreciated.
Peter.