File Manipulation

wavemasta

Member
Joined
Feb 12, 2009
Messages
12
Programming Experience
10+
Hi:
I really need help, because I have been battling this for 3 days.
I have two files. one is called unix_dump_a.txt, the others called red_items.txt.
I want to parse the contents of the red_item file, and then parse the unix_dump.txt file, and remove data that is in the red_item file, from the unix_dump file.

I did this by writing the elements in the unix file, which were not in the red_items file, to a new file, which I called unix_strip.

Thing is that its not working as its meant to. When I check the unix_strip file (The output), I still see stuff in there, thats also in the red_item file, and thats not supposed to be.

This is what I have done so far:

VB.NET:
[INDENT]Sub stripUnixItems()
        'first create the strip file for unix
        Dim fs As New FileStream(configFileDir & "\unix_strip.txt", FileMode.Create, FileAccess.Write)

        'get a handle to existing unix dump file for reading

        'Dim dumpFileReader = New StreamReader(configFileDir & "\unix_dump_a.txt")

        'read contents of unix red list
        Dim redListReader = New StreamReader(configFileDir & "\unix_red_items.txt")

        'get handle to write to strip file

        Dim sw = New StreamWriter(fs)

        'read contents of red list line by line.
        Dim redListLine As String

        Do 'begin to read the redlist file

            redListLine = redListReader.ReadLine()
            MsgBox(redListLine)
            'for each line in the redlist
            'first check for wildcards
            'If redListLine.Contains("*") Then
            '    'handle wildcards
            'Else
            'the redlist items are whole text
            'for each redlist line check if it exists in unix dump file
            Dim currentRow() As String
            Using myreader As New Microsoft.VisualBasic.FileIO.TextFieldParser(configFileDir & "\unix_dump_a.txt")
                myreader.TextFieldType = FileIO.FieldType.Delimited
                myreader.SetDelimiters(";")
                While Not myreader.EndOfData And String.Empty.Equals(redListLine)
                    currentRow = myreader.ReadFields()
                    'we know the first 3 elements combined always give the full tag name
                    If Not redListLine.Contains(currentRow(0) & "." & currentRow(1) & "." & currentRow(2)) Then
                        'write to the strip file
                        Dim i As Integer
                        For i = 0 To currentRow.Length - 1
                            sw.Write(currentRow(i))
                        Next
                        sw.WriteLine()

                    End If
                End While
            End Using

            'End If

        Loop Until redListLine Is Nothing



    End Sub[/INDENT]

Unfortunately, my code doesn't work the way I want it to, and its frustrating.. Any item thats in the red_item file should be removed from the unix_dump file, and placed in the unix_strip file.
Any ideas?
Thanks!
 
you didnt define how you determine if a line is in both files, so here is a simplistic example that takes the entire line as the "primary key"
It may not compile but will work when you fix any minor errors preventing it from doing so:

VB.NET:
Dim dict as New Dictionary(Of String, Integer)

ForEach line as String in File.ReadAllLines("redlist.txt")
  If dict.ContainsKey(line) Then
    dict(line)+=1
  Else
    dict(line) = 0
  End If
Next

Dim sw as New StreamWriter("unix_strip.txt")

ForEach line as String in File.ReadAllLines("unix_dump.txt")
  If Not dict.ContainsKey(line) Then sw.WriteLine(line)
Next
Simple when we have the right tools :)

Note: case sensitive
Note2: Bonus! Dict contains the number of times a line has been seen. If for example, redlist contains the same line twice and unixdump contains it 5 times and you want 3 entries on your strip file (because 5 minus 2 is 3) then change your If Not ContainsKey, to be

dict(line) -=1
If dict(line) < 0 Then sw.Write...
 
Thanks!

Dear Cjard:
Thanks so much for your help. I had coded like 3 times what you did, and was still banging my head against the wall.
I modified the code to suit my requirements. Since the red_item file will contain entries like

x.y.z (in this form), and the unix_file I am meant to filter with each entry in the red_item file has entries of the form

"x";"y";"z"; "etc".. So I just read each line, strip the semicolons and double quotes, get the first 3 elements, and concatenate them with a "." to get something of the form, x.y.z. Then I can now check if it exist in the dictionary.
See the modified code:

VB.NET:
Sub stripUnixItems()

        Dim dict As New Dictionary(Of String, Integer)
        Dim i As Integer = 0
        For Each line As String In File.ReadAllLines(configFileDir & "\unix_red_items.txt")
            dict.Add(line, i)
            i += 1
        Next

        Dim sw As New StreamWriter(configFileDir & "\unix_strip.txt")

        For Each line As String In File.ReadAllLines(configFileDir & "\unix_dump_a.txt")
            Dim line2() As String
            line2 = line.Split(";")
            If Not line2.Length <= 1 Then
                Dim tagname As String
                tagname = line2(0) & "." & line2(1) & "." & line2(2)
                If Not dict.ContainsKey(tagname.Replace("""", String.Empty)) Then sw.WriteLine(line)
            Else
                Continue For
            End If

        Next
    End Sub

And this works excellently. Thing is I want to also filter with wildcard expressions.

So my red_item file could have an expression of the form

x.y.z and the code I posted above handles that well.

But it could also have expressions of the form

test*.*.*
*.test*.*
a*.*.*

Do you have an idea how to make this work?(Handling wildcard expressions like that); I am looking at the Regex class and the isMatches() method, but haven't been able to come up with something. Please do you have any ideas or pointers (not the c type :))
Thanks!
 
It's starting to look like you'd be a lot better off using a database

SELECT * FROM table WHERE name LIKE 'test%.%.%'
SELECT * FROM table WHERE name LIKE '%a%.%.%'

or

SELECT * FROM table WHERE name1 LIKE '%' and name2 LIKE 'test%' AND name3 LIKE '%'



If youre going to use a database, read the PQ link in my signature

Note also you'll need indexes on all your fields note note also that any search for table entries where all search terms start with a % will require a full table scan


Doing this yourself witha dictionary would probably entail:

3 dictionaries
Entries for every character in every word (test, must be added as t, te, tes, test, then you can strip a trailing * and search for test. starting * requires a scan of all keys, middle * requires a partial range match then a scan of the range



All in, use a database
 
Back
Top