Remove Duplicate lines in a text File

mattkw80

Well-known member
Joined
Jan 3, 2008
Messages
49
Location
Ontario, Canada
Programming Experience
Beginner
Hey Everyone,

Having trouble removing Duplicate lines in a text file. (Exact Duplicate lines).

If I have a text file called animals.txt, how can I use the StreamWriter to loop through a text file, and erase and duplicate lines?

For example: If the text files says this...

cat
cat
cat
dog
bird

Then I would want the output to be:

cat
dog
bird

With all line spaces also left over.

Any help is greatly appreciated.

Matt
 
You don't need to use a StreamReader. There are easier ways, which include LINQ:
VB.NET:
Dim lines As String() = IO.File.ReadAllLines("file path here")

lines = lines.Distinct().ToArray()
IO.File.WriteAllLines("file path here", lines)
 
Wow, that's incredible, you solved it in 3 lines of code. I googled this for hours, and nobody had a clear answer, or an answer under 20 lines of code.

I know nothing of LINQ, but it sure is a time saver.

Thank you so much.
 
The LINQ part is the '.Distinct().ToArray()'. Without LINQ you could do it like this:
VB.NET:
Dim lines As String() = IO.File.ReadAllLines("file path here")
Dim distinctLines As New HashSet(Of String)

For Each line As String In lines
    distinctLines.Add(line)
Next

Array.Resize(lines, distinctLines.Count)

distinctLines.CopyTo(lines)
IO.File.WriteAllLines("file path here", lines)
The generic HashSet class was added in .NET 3.5. To do the equivalent without the HashSet:
VB.NET:
Dim lines As String() = IO.File.ReadAllLines("file path here")
Dim distinctLines As New List(Of String)

For Each line As String In lines
    If Not distinctLines.Contains(line) Then
        distinctLines.Add(line)
    End If
Next

lines = distinctLines.ToArray()
IO.File.WriteAllLines("file path here", lines)
 
jmcilhinney said:
Without LINQ you could do it like this:
VB.NET:
Dim distinctLines As New HashSet(Of String)

For Each line As String In lines
    distinctLines.Add(line)
Next

Array.Resize(lines, distinctLines.Count)

distinctLines.CopyTo(lines)
or simpler using constructors to fill:
VB.NET:
Dim distinctLines As New HashSet(Of String)(lines)

lines = New List(Of String)(distinctLines).ToArray
Note this ToArray method is not the Enumerable.ToArray (Linq) method used in first reply, but one specific to the List(Of T).
 
Back
Top