Comparing files

samjda

New member
Joined
May 21, 2011
Messages
3
Programming Experience
Beginner
Hi everybody,

I've got a text file1 (word/tag):

the/PREP sky/NOUN is/VERB blue/ADJ ,/PUNC it/PREP is/VERB wonderful/ADJ

And an other text file2 containing grammar rules (tag sequences)

PUNC ABBREV ABBREV PUNC
PUNC ABBREV PUNC
PUNC ABBREV ABBREV ABBREV PUNC
PUNC NUM PUNC
PUNC DET+NOUN_PROP PUNC
NOUN_PROP ADJ
PREP NOUN VERB ADJ PUNC PREP VERB ADJ

I aim to do what's next :
For each word in the first file (word/tag) compare the tags (what's after /) with the fist tag of each grammar rules (file2) if this match continue the matching process with the rest of the grammar rule (matching file1 components with file2 components) until the end of this rule. Then copy the string of file1 corresponding to a grammar rule in an array then print it in a richtextbox. I will do this rule by rule (read the file2 one line at a time)

Thank you for your help
 
Oh Looks Fun!

So basically just read in the text files and then parse the text into some variables, say a string and a string array. You can use StreamReader
            Dim reader As New System.IO.StreamReader(fileLocation)

            While Not reader.EndOfStream 'or you can use While Peek <> -1 ' eitherway it should get the same results
                Dim line As String = reader.ReadLine

'do something with that line...
            End While

            reader.Close()


for the sentence after reading the entire line into a string just use the split function and store it into an array. You can first split it by the spaces and then split it again by the "/" so you can identify what is the tag vs the word. You can use a Structure to store it if you want..

    Structure wordTag
        Dim word As String
        Dim tag As String
    End Structure

'in some function or something...
            Dim sentence As String = "" ' the value of  sentence that was read in by the text reader should go in here

            Dim bufferWords() As String = sentence.Split(" ")
            Dim wordTags As New ArrayList
            For Each wordt As String In bufferWords
                Dim tags As String() = wordt.Split("/")
                Dim tempTag As wordTag
                tempTag.word = tags(0)
                tempTag.tag = tags(1)

                wordTags.Add(tempTag)
            Next


then do something similar for the rules and store it into an array and use split(" ") to separate the tags for each rule tag, then you can simply go through each word in the sentence and start comparing using an IF statement. So check that the word matches the first tag of a rule, if it does, continue to check the rest of them, if it doesnt... return an error or whatever..

hope that helps... i might write this later today, looks like lots of fun. Working on a spell checker or something?
 
hi Flippedbeyond,
Thank's for your prompt reply :D
I'm programming an automatic summarizer (trying) i've read your program, juset before posting my problem i wrote a pseudo-code, that's not working as i expected, so i'll show you my code maybe you can help me figuring out where the problem(s) is (are):
 
Imports System.IO
Imports System.Text
Imports System.Collections.Generic

Public Class Form7
    Dim file1 As String
    Dim file2 As String
    Dim tablo As New List(Of String) 'the list containing my text (word/tag)
    Dim tab2 As New List(Of String) 'list containing the grammar rules
    Dim tabli As New List(Of String) ' list containing the sequence wanted to be displayed 
    Dim i As Integer
    
    Function compare(ByVal val As String, ByVal val1 As String) As Boolean

        Dim var As String = val
        Dim pos As Long = InStr(var, "/")
        Dim var1 As String = Mid(var, pos + 1)
        Dim bool As Boolean
        If var1 = val1 Then
            bool = True
        Else
            bool = False
        End If

        Return (bool)

    End Function

 Private Sub Button5_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button5.Click
        
If i <= tablo.Count - 1 Then
            i = i + 1
        End If

        Dim fichier As StreamReader = New StreamReader("C:\Users\samjda\Desktop\Regles.txt", Encoding.UTF8)
        Dim cont1 As String = fichier.ReadLine
        Dim j As Integer = 0
        Dim x As Integer = 0

        While Not (cont1 Is Nothing)
            RichTextBox3.Clear()

            Dim tab() As String = cont1.Split(" "c)

            For v = 0 To UBound(tab)
                tab2.Add(tab(v))
            Next

            Dim k As Integer = i
            Dim l As Integer = j
            x = 0

            Dim bool As Boolean = compare(tablo(k), tab2(l))

            While (x <= tab2.Count - 1) And bool = True
                tabli.Add(tablo(k))
                k = k + 1
                l = l + 1
                bool = compare(tablo(k), tab2(l))
                x = x + 1
            End While

            
            Dim tb() As String = tabli.ToArray

            If tabli.Count = x Then
                For g = 0 To UBound(tb)
                    RichTextBox3.AppendText(tb(g) & vbLf)
                Next
            End If

            
            tabli.Clear()
            Erase tb
            Erase tab
            cont1 = fichier.ReadLine
            '  tab2.Clear() 
            
        End While
        fichier.Close()
    End Sub 
 
Debugged your code - Please be more organized next time

Hey So i took a look at your code. I think it can def be organized more logically and better.

anyways i debugged the code for you, See the comments. I only modified your code in the Button click event function.

  • i removed your if statement that adds one to the variable i (try to use a different name for your variables.. you have alot of letters... very confusing) i dont know why you had that if block there...
  • edited your condition statment of your While block, you were checking for <=. it should be just < since you add 1 to the variables before you use them, not after you use them.
  • uncommented your tab2.clear(), needs to be there otherwise your tab2 list keeps accumulating the tags and you do not want that



Should work now, when i ran it with my test files, aka the text you posted before, it ran fine and added the sentence to the richtextbox.

Heres the code

Imports System.IO
Imports System.Text
Imports System.Collections.Generic


Public Class Form1
    Dim file1 As String
    Dim file2 As String

    Dim tablo As New List(Of String) 'the list containing my text (word/tag)
    Dim tab2 As New List(Of String) 'list containing the grammar rules
    Dim tabli As New List(Of String) ' list containing the sequence wanted to be displayed 

    Dim i As Integer


    'code for testing
    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
        Dim reader As New StreamReader("C:\TESTING\sentence.txt")

        While Not reader.EndOfStream
            Dim wordTag As String = reader.ReadLine

            tablo.AddRange(wordTag.Split(" "c))
        End While
    End Sub

    'did not modify this - seemed to work fine
    Function compare(ByVal val As String, ByVal val1 As String) As Boolean
        Dim var As String = val
        Dim pos As Long = InStr(var, "/")
        Dim var1 As String = Mid(var, pos + 1)
        Dim bool As Boolean

        If var1 = val1 Then
            bool = True

        Else
            bool = False

        End If

        Return (bool)
    End Function


    Private Sub Button5_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button5.Click
        'what is this for?
        'If i <= tablo.Count - 1 Then

        '    i = i + 1

        'End If
        Dim fichier As StreamReader = New StreamReader("C:\Testing\rules.txt", Encoding.UTF8)
        Dim cont1 As String = fichier.ReadLine

        Dim j As Integer = 0
        Dim x As Integer = 0

        While Not (cont1 Is Nothing)
            RichTextBox3.Clear()


            Dim tab() As String = cont1.Split(" "c)


            For v = 0 To UBound(tab)

                tab2.Add(tab(v))

            Next


            Dim k As Integer = i
            Dim l As Integer = j
            x = 0


            Dim bool As Boolean = compare(tablo(k), tab2(l))


            While (x < tab2.Count - 1) And bool = True 'removed <=
                tabli.Add(tablo(k))

                k = k + 1
                l = l + 1

                bool = compare(tablo(k), tab2(l))

                x = x + 1
            End While


            Dim tb() As String = tabli.ToArray

            If tabli.Count = x Then

                For g = 0 To UBound(tb)

                    RichTextBox3.AppendText(tb(g) & vbLf)

                Next

            End If



            tabli.Clear()

            Erase tb
            Erase tab

            cont1 = fichier.ReadLine

            tab2.Clear()
        End While

        fichier.Close()

    End Sub

End Class



Hope that helps, Good luck with it, if i feel like it later today, i'll write my version of this application and post it on here.

Happy Coding :)
 
Dim s = "the/PREP sky/NOUN is/VERB blue/ADJ ,/PUNC it/PREP is/VERB wonderful/ADJ"

Dim entities = From word In s.Split(" "c)
               Select item = word.Split("/"c)
               Select New With {.word = item(0), .tag = item(1)}

Dim sentence = String.Join(" ", From item In entities Select item.word)
Dim rule = String.Join(" ", From item In entities Select item.tag)

Dim rules = IO.File.ReadAllLines("rules.txt")
If rules.Contains(rule) Then
    Me.RichTextBox1.AppendText(sentence)
End If
 
Hi johnH, i've never seen that before, very cool.

Can you walk through the code with quick explanations of what happens behind the scenes slightly? I've never used the "from","select","with" before. Very interesting, def gotta play with this :)

Thats an IEnumerable your creating on the fly right?
 
Hi johnH, i've never seen that before, very cool.

Can you walk through the code with quick explanations of what happens behind the scenes slightly? I've never used the "from","select","with" before. Very interesting, def gotta play with this :)

Thats an IEnumerable your creating on the fly right?
It is called LINQ. From queries return IEnumerable's, true. The return type is inferred, which is here also required since it is an anonymous type.
 
LINQ, gotta read up on that :) ,cool, Thanks.
 
Back
Top