Question How do I filter through downloaded source for particular information?

Santaslittlehelper

New member
Joined
Dec 15, 2012
Messages
2
Programming Experience
3-5
How do I filter through downloaded source for particular information?




So far I have wrote a program to download a webpage's source, now I would like to search through this data and pick out particular information. In this case I would like whenever information is between <b> and </b> to be added to a list item. My code so far is below:

Public Class NameList
    Dim thread As System.Threading.Thread
    Dim sourcecode As String


    Sub GetSource()
        Dim request As System.Net.HttpWebRequest = System.Net.HttpWebRequest.Create("http://www.websiteidliketouse.co.uk")
        Dim response As System.Net.HttpWebResponse = request.GetResponse()


        Dim sr As System.IO.StreamReader = New System.IO.StreamReader(response.GetResponseStream())


        sourcecode = sr.ReadToEnd()


    End Sub
    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        thread = New System.Threading.Thread(AddressOf GetSource)
        thread.Start()
    End Sub


    Private Sub Button2_Click(sender As Object, e As EventArgs) Handles Button2.Click


    End Sub
End Class
 
Last edited by a moderator:
Hi,

Since your SourceCode variable contains the source of the web page as a string you can easily create a loop to iterate through the string to find the tags your are looking for using the IndexOf property and extract the information between those tags using the SubString method.

Another way you could do this however is to use the WebBrowser class which then exposes the GetElementsByTagName method through its Document property. Have a look here:-

VB.NET:
Imports System.Net
 
Public Class Form1
  Dim WithEvents myWebBrowser As New WebBrowser
 
  Private Sub GetSource()
    Me.Cursor = Cursors.WaitCursor
    myWebBrowser.Navigate("http://www.websiteidliketouse.co.uk")
  End Sub
 
  Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
    GetSource()
  End Sub
 
  Private Sub myWebBrowser_DocumentCompleted(sender As System.Object, e As System.Windows.Forms.WebBrowserDocumentCompletedEventArgs) Handles myWebBrowser.DocumentCompleted
    If myWebBrowser.ReadyState = WebBrowserReadyState.Complete Then
      Me.Cursor = Cursors.Default
      For Each elem As HtmlElement In myWebBrowser.Document.GetElementsByTagName("td")
        If Not Trim(elem.InnerText) = String.Empty Then
          MsgBox(elem.InnerText)
        End If
      Next
    End If
  End Sub
End Class

You will notice that I have created this example to look for the <td> tag since there are no <b> tags in the page you have specified.

Hope that helps.

Cheers,

Ian
 
Back
Top