Sort CSV file by a specified column

Gwasshoppa

Member
Joined
Aug 26, 2011
Messages
7
Programming Experience
5-10
Hi there

I have a CSV file that i want to sort by the second column of data... i.e.
COLUMN 1,COLUMN 2,COLUMN 3,COLUMN 4
1234,54689,55489,The fox
5554,4568,59823,Brush
0578,XYZ,PQRS,12364

I want to be able to write a function that can sort on any one of these columns depending on the index i pass into the function.

i.e. if i pass in index 1 into the function then i need to sort the CSV file data based on COLUMN 2 and thus the data would end up something like this.
COLUMN 1,COLUMN 2,COLUMN 3,COLUMN 4
5554,4568,59823,Brush
1234,54689,55489,The fox
0578,XYZ,PQRS,12364

Note row 1 has now become row 2 because 4568 < 54689 < XYZ

I have looked into using the ArrayList.Sort method but am not really sure what I need to do to make this work.

If I make a custom class using iComparable I just dont seem to understand how I can write this.

Can someone give me some ideas... I can write the data from the text CSV file into the ArrayList no worries... and do a standard MyArray.Sort but how would i write the Custom object to hold my rows of CSV data using the IComparer and IComparable?

Cheers
Gwasshoppa
 

jmcilhinney

VB.NET Forum Moderator
Staff member
Joined
Aug 17, 2004
Messages
14,169
Location
Sydney, Australia
Programming Experience
10+
First up, don't use an ArrayList. They should not be used at all from .NET 2.0 onwards. I would suggest two options:

1. Read the data into a DataTable.

2. Define a type with appropriate properties for one record and then create a List(Of YourType).

The DataTable is the simplest option and makes for relatively easy sorting, as well as filtering and editing via bound controls. The custom type and generic List option is a little better for working with the data in code, but sorting is slightly more difficult and some other things can be more difficult too, although they can be worked around by using a different type of collection. Implementing IComparable in your custom type is not really an option either, because that is for objects that are always compared in the same way. If you want to compare by different properties at different times, that requires custom comparisons.

I would suggest that you follow the Blog link in my signature and read my three-part post on Sorting Arrays & Collections (the oldest three posts). If you have more specific questions after that, post back here.
 

rcombs4

Well-known member
Joined
Aug 6, 2008
Messages
189
Programming Experience
3-5
jmcilhinney's options are good ones. But I've done similar things by creating a custom Comparable class.

iComparer works something like this..

Code:
Public Class MyCSVComparer
   Implements IComparer

   Public Function Compare(x as object, y as object) as Integer Implements IComparer.Compare
     dim strX as string = x.split(",")(1)
     dim strY as string = y.split(",")(1)

     Return String.Compare(strX, strY)

   End Function
You will need to do some verification of the x and y values to avoid errors, but if you want to go down that road that should give you a good start.
 

jmcilhinney

VB.NET Forum Moderator
Staff member
Joined
Aug 17, 2004
Messages
14,169
Location
Sydney, Australia
Programming Experience
10+
jmcilhinney's options are good ones. But I've done similar things by creating a custom Comparable class.

iComparer works something like this..

Code:
Public Class MyCSVComparer
   Implements IComparer

   Public Function Compare(x as object, y as object) as Integer Implements IComparer.Compare
     dim strX as string = x.split(",")(1)
     dim strY as string = y.split(",")(1)

     Return String.Compare(strX, strY)

   End Function
You will need to do some verification of the x and y values to avoid errors, but if you want to go down that road that should give you a good start.
IComparable and IComparer are two different things. Implementing IComparer is a viable option here, and it's one of the options that I cover in my blog posts. You would have to do more than you have there though, because you need some way to specify which column/property to compare by. You have hard-coded the second column there, so you'd need to make that configurable.

Implementing IComparable is not feasible here for that very reason: there's no way to configure which column/property to compare by on an ad hoc basis. The mode of comparison does need to be hard-coded into an IComparable implementation.
 

Gwasshoppa

Member
Joined
Aug 26, 2011
Messages
7
Programming Experience
5-10
Awesome thanks guys... I think the datatable is probably going to be the easiest way for me to handle this as I will be able to determine the correct column to sort by and then just sort it.

I have a collection of the column set up details the data structure in a kind of config table that i am reading the column details from so using the datatable option will be excellent.

I will give it a go today and let you know how it goes and maybe post some code.

Thanks again for the ideas :)
Cheers
Gwasshoppa
 

jmcilhinney

VB.NET Forum Moderator
Staff member
Joined
Aug 17, 2004
Messages
14,169
Location
Sydney, Australia
Programming Experience
10+
Just note that, with a DataTable, you don't actually sort the table itself. It has a DefaultView property that contains a DataView. That DataView has a Sort property. When you set it, the contents of the DataTable are unaffected but the contents of the DataView will be sorted. Note that, when you bind a DataTable, it's actually the contents of the DefaultView that gets displayed.
 

Gwasshoppa

Member
Joined
Aug 26, 2011
Messages
7
Programming Experience
5-10
Ok I have got the inserting into the datatable working and the sort working, I now need to write to sorted data back out to a new csv file.

The problem now is that each column data in the dataview now has speackmarks around each item...
i.e. initial data looks like this 1235,00001,23
1234,00001,5
1234,00002,3
1237,00002,10
1234,00003,21
1234,00004,41
1235,00005,9
1236,00002,85

and the new sorted datatable looks like this
"1234","00001",'5"
"1234","00002","3"
"1234","00003","21"
"1234","00004","41'
"1235","00005","9"
"1235","00001","23"
"1236","00002","85"
"1237","00002","10"

I was hoping that the "" would not surround the data... is there a simple way of not having these?
NOTE: Not all my data will be numerical so setting the column data type to int or similar will not work :(

Cheers
Gwasshoppa
 

Gwasshoppa

Member
Joined
Aug 26, 2011
Messages
7
Programming Experience
5-10
Cool... never knew that existed lol

So when I read the data out of the datatable i need to read it into a TextFieldParser and then into a new csv file... I'm not exactly sure how to achieve this.

I have been able to use the TextFieldParser to get the data from the text file into the datatable (it puts quotes around the data), but going from the datatable into a new text file i'm not sure how to do.

Do you have a small sample of how to go from the datatable into the csv file using the TextFieldParser?

Cheers
Gwasshoppa
 

jmcilhinney

VB.NET Forum Moderator
Staff member
Joined
Aug 17, 2004
Messages
14,169
Location
Sydney, Australia
Programming Experience
10+
The TextFieldParser is for reading the contents of a file into your app. To write the data out, you simply use a StreamWriter. Nested For or For Each loops will let you write each field of each row.
 

Gwasshoppa

Member
Joined
Aug 26, 2011
Messages
7
Programming Experience
5-10
Cool... thank you very much for the help.

I thought that may have been the case for the TextFieldParser :)... I have used some looping etc and managed to get something sorted... Not sure how efficient the code is, but there should not be too much data in the file so i dont think performance will be too much of an issue.

Thanks again for your help... much appreciated
Cheers
Gwasshoppa
 
Top Bottom