EDIT: the final solution has been reached and the time required to split files has been reduced by at least a factor of 10 by using a dictionary rather than direct IO. see later posts for the solution code.
greetings,
i have the following snippet of code:
essentially what i'm doing is to read through each item in a collection of comma separated variable strings. the first variable in each string is appended to a master filename and becomes the NEW filename to which the entire string will be appended. this is a way to split a CSV file into multiple files using the first variable in that string.
the trouble is that this is doing a LOT of writes to potentially a LOT of files, the IO is taking forever. is there a way to buffer this data until the end, then write it to a disk? be aware, the filenames and total number of files to be created is UNKNOWN at the begining of the loop. it just creates a new file each time a unique first column ID is discovered.
thanks in advance
greetings,
i have the following snippet of code:
VB.NET:
For Each tempstring In SourceDataCollection
LineToFieldsTemp = Split(tempstring, ",")
LineStringWithoutFirstEntry = "" 'reset the linestring
For i = 1 To UBound(LineToFieldsTemp)
LineStringWithoutFirstEntry = LineStringWithoutFirstEntry & LineToFieldsTemp(i) & ","
Next
LineStringWithoutFirstEntry = Mid(LineStringWithoutFirstEntry, 1, Len(LineStringWithoutFirstEntry) - 1) 'rmv last ,
CurrentFileName = ResultsDirectoryName & "\" & SourceFileName & "_" & LineToFieldsTemp(0) & ".csv"
'MsgBox(CurrentFileName)
Using FileWriterSW As New StreamWriter(CurrentFileName, True) 'append, if the file doesn't exist it will create it
FileWriterSW.WriteLine(LineStringWithoutFirstEntry) 'writes the dataline to file
'FileWriterSW.Close() 'close the opened file
End Using
Next
essentially what i'm doing is to read through each item in a collection of comma separated variable strings. the first variable in each string is appended to a master filename and becomes the NEW filename to which the entire string will be appended. this is a way to split a CSV file into multiple files using the first variable in that string.
the trouble is that this is doing a LOT of writes to potentially a LOT of files, the IO is taking forever. is there a way to buffer this data until the end, then write it to a disk? be aware, the filenames and total number of files to be created is UNKNOWN at the begining of the loop. it just creates a new file each time a unique first column ID is discovered.
thanks in advance
Last edited by a moderator: