Read text before another text

prologikus

Member
Joined
Nov 9, 2012
Messages
16
Programming Experience
Beginner
How i can make the webbrowser to automatic dim RED TEXT from this code as a string every refresh he make

The full code is :
VB.NET:
<a href='[URL="http://www.vbdotnetforums.com/view-source:http://sitizens.com/profile/17742"]http://sitizens.com/profile/17742[/URL]'>Trololol elite</a></td><td style='text-align:right !important;'>[COLOR=#ff0000]413,2[/COLOR]</td></tr>
 
If you are loading this into a .NET WebBrowser control, then you should retrieve WebBrowser.Document (type HtmlDocument) and then use .GetElementsByTagName("td") to get a collection of each "TD" elements inside your document.
This collection (type HtmlElementCollection) allows indexation via .Item(index), iteration via "For each ... Next" or even LINQ Query.
So you should choose one of these techniques to retrieve the one "td" you want as type HtmlElement. Once you've done that, just get the text you want with the .InnerText property.
Hope it helps.
 
tnx for help but i dont understand so good :< ...
can you explain more simply or just give code ?

Ok, perhaps code explains what I said more eloquently.

Assuming you have a Form ("Form1"), and you've put into it a TextBox ("TextBox1"), a Button ("Button1") and a WebBrowser ("WebBrowser1"), let "Class Form1" have this code and run it. Write the URL of the site you want to read from and click the button.

Public Class Form1
    Private Sub Button1_Click(sender As System.Object, e As System.EventArgs) Handles Button1.Click
        WebBrowser1.Navigate(TextBox1.Text)
    End Sub

    Private Sub WebBrowser1_DocumentCompleted(sender As System.Object, e As System.Windows.Forms.WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted
        Dim TDcollection As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("td")
        For Each TDelement As HtmlElement In TDcollection
            MsgBox(TDelement.InnerText)
        Next
    End Sub
End Class


Please tell if you have further doubt.
 
ok .. is good it give me what i want... But in the source of page exist others codes like that but they are distinguished by the profile link how i can extract it ?

VB.NET:
<a href='[URL="http://www.vbdotnetforums.com/view-source:http://sitizens.com/profile/17742"][COLOR=#ff0000]http://sitizens.com/profile/17742[/COLOR][/URL]'>Trololol elite</a></td><td style='text-align:right !important;'>413,2</td></tr>
<a href='[URL="http://www.vbdotnetforums.com/view-source:http://sitizens.com/profile/21216"][COLOR=#ff0000]http://sitizens.com/profile/2121[/COLOR][/URL][URL="http://www.vbdotnetforums.com/view-source:http://sitizens.com/profile/21216"][COLOR=#ff0000]6[/COLOR][/URL]'>Alyn</a></td><td style='text-align:right !important;'>2,212</td></tr>
 
ok .. is good it give me what i want... But in the source of page exist others codes like that but they are distinguished by the profile link how i can extract it ?

VB.NET:
<a href='[URL="http://www.vbdotnetforums.com/view-source:http://sitizens.com/profile/17742"][COLOR=#ff0000]http://sitizens.com/profile/17742[/COLOR][/URL]'>Trololol elite</a></td><td style='text-align:right !important;'>413,2</td></tr>
<a href='[URL="http://www.vbdotnetforums.com/view-source:http://sitizens.com/profile/21216"][COLOR=#ff0000]http://sitizens.com/profile/2121[/COLOR][/URL][URL="http://www.vbdotnetforums.com/view-source:http://sitizens.com/profile/21216"][COLOR=#ff0000]6[/COLOR][/URL]'>Alyn</a></td><td style='text-align:right !important;'>2,212</td></tr>

Yes, this is one step further, and the approach will depend entirely on the design and structure of the page you want to read from.

So you want to read the content inside one "td" that is next to the "td" that contains a link which contains a profile number. My first try for such an attempt would be this:

    Private Sub WebBrowser1_DocumentCompleted(sender As System.Object, e As System.Windows.Forms.WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted
        Dim TDcollection As HtmlElementCollection = WebBrowser1.Document.GetElementsByTagName("td")
        For Each TDelement As HtmlElement In TDcollection
            If TDelement.GetElementsByTagName("a").Count = 1 AndAlso
                TDelement.GetElementsByTagName("a")(0).GetAttribute("href").EndsWith("17742") Then
                MsgBox(TDelement.NextSibling.InnerText)
            End If
            MsgBox(TDelement.InnerText)
        Next
    End Sub


Please notice what this code does (based on your samples):
1. It iterates through all "td" in your html document.
2. Each "td" is tested wether it contains one - and only one - link/anchor element ("a");
3. If so, it will test if the "href" attribute of this "a" element (type as string), ends with the number I assume you're interested in;
* Notice the use of "AndAlso" instead of "And" in the boolean expression; this is to short-circuit the test as false when the number of "a" elements inside the "td" is different of 1; then you'll avoid an error of trying to get the element "0" from a collection of "a" elements that actually contains no elements (in case "td" contains no "a" elements)
4. If all conditions are true, then it will get the .InnerText property, not from the tested "td", but from whichever HTML element that comes next aside.

This is not the algorithm I'd rather use, however. Since the document contains a "table" element, I'd prefer to iterate its rows ("tr") elements, and get both "td" to be tested and "td" to be read from its own indexes, accordingly to its positions inside the row. This is nothing but a a scratch made upon your samples, and I'm sure if you get it and study content related to WebBrowser, HtmlDocument and HtmlElement classes, you'll get pretty able to do far more complex operations than that, because these classes allow you not only to read HTML content, but to raise events programamtically control and automate navigation.

It also requires some notions of HTML itself, for, as you should know by now, HTML elements can be containers for other elements. For instance, tables can be nested inside "td" (table divisions, that is, cells) of bigger tables, so testing "td" elements can be tricky, because sometimes we look for one "td" that contains some expression, but the string expressions we use to test this will return true to the "td" that contains the "table" that contains the "td" we look for.

Since I'm not a trained professional, I've banged my head against many walls to gather this knowledge, so I hope you can do it with less pain.

Good luck, then!
 
dont work :|
look here i put all source of page :)
and CTRL+F type "G3orG3 E" ... ( new i change my profile ) and then you will find after that that number : "5,231" ..
 

Attachments

  • sourcepage.txt
    64 KB · Views: 57
You should have in mind that the purpose of the forum is not providing quick solutions for people.
Nevertheless, I understand you might be not familiar with these elements, so I tried and wrote the code below, beacuse it illustrates all I have said before. It was tested and it works. Please study it, its concepts and commands, so you can learn how to deal with them, and you'll find this very useful, for any further development is upon yourself.

Public Class Form1
    Public MyResult As String
    Private Sub Button1_Click(sender As System.Object, e As System.EventArgs) Handles Button1.Click
        WebBrowser1.ScriptErrorsSuppressed = True
        WebBrowser1.AllowNavigation = False
        WebBrowser1.Navigate("file:///D:/sourcepage.html")
    End Sub

    Private Sub WebBrowser1_DocumentCompleted(sender As System.Object, e As System.Windows.Forms.WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted
        Dim MyDocument As HtmlDocument = WebBrowser1.Document
        Dim MyDocumentSourceCode As String = MyDocument.Body.Parent.OuterHtml
        'There is more than one table, so iterate all of them
        For Each Table As HtmlElement In MyDocument.GetElementsByTagName("table")
            'Iterate each row of each table
            For Each TR As HtmlElement In Table.GetElementsByTagName("tr")
                'This is TR structure.
                '
                '<tr>
                '  <td><span class='label label-success'><i class='icon-arrow-up icon-white'></i></span></td>
                '  <td style='text-align:center !important;'><span class='badge'>1</span></td>
                '  <td style='text-align:center !important;'>$10.00 <nobr>150 Rubies</nobr> </td>
                '  <td style='text-align:center !important;' width=100%><a href='http://sitizens.com/profile/g3org3'>G3orG3 Emperor of Business</a></td>
                '  <td style='text-align:right !important;'>5,231</td>
                '</tr>
                '
'It must have five cells ("td"), and the fourth (index 3 because it starts from 0) must contain an anchor/link ("a").
                If TR.GetElementsByTagName("td").Count = 5 AndAlso TR.GetElementsByTagName("td").Item(3).GetElementsByTagName("a").Count = 1 Then
                    Dim TDname As HtmlElement = TR.GetElementsByTagName("td").Item(3)
                    Dim Aname As HtmlElement = TDname.GetElementsByTagName("a").Item(0)
                    If Aname.GetAttribute("href") = "http://sitizens.com/profile/g3org3" Then
                        MyResult = TR.GetElementsByTagName("td").Item(4).InnerText
                        Exit For
                    End If
                End If
            Next
            If MyResult <> "" Then Exit For
        Next
        MsgBox(MyResult)
    End Sub
End Class
 
Back
Top