Question Parsing HTML

XardozCom

Member
Joined
May 10, 2013
Messages
16
Location
Washington State USA
Programming Experience
10+
This was covered in this thread, but somehow I just don't get it...

I am trying to get the results from a WEB page HTML
No matter how I parse it, I do not get the results I want

VB.NET:
      Private Sub GetSiteInformation()
        For Each currentDiv_Element As HtmlElement In WebBrowser1.Document.GetElementsByTagName("span")
            Debug.Print(currentDiv_Element.GetAttribute("span").ToString)
            If currentDiv_Element.GetAttribute("span") = "std-address" Then
                For Each currentSpan_Element As HtmlElement In currentDiv_Element.GetElementsByTagName("span")
                    If currentSpan_Element.GetAttribute("span") = "address1 range" Then
                        txtSiteAddress.Text = currentSpan_Element.InnerText
                    End If
                Next
                For Each price_Element As HtmlElement In currentDiv_Element.GetElementsByTagName("div")
                    If price_Element.GetAttribute("className") = "city" Then
                        txtSiteCity.Text = price_Element.InnerText
                    End If
                Next
            End If
        Next
    End Sub

Attached is the HTML, I am wanting the address information after this tag <p class="std-address">.
Here are the actual tags I need.
VB.NET:
<span class="address1 range">22805 60TH AVE W</span>
<span class="city range">MOUNTLAKE TERRACE </span>
<span class="state range">WA</span>
<span class="zip" style="">98043</span>
<span class="zip4">3715</span>
<dt>County</dt>
<dd>SNOHOMISH </dd>
 

Attachments

  • Parsing HTML.txt
    49 KB · Views: 36
Hi,

You seem to have got yourself tied up with the wrong Tag names. You should first of all be looking for a Tag name of "p" that has a Class attribute of "std-address" and then look through this element for additional elements with a Tag name of "span". Note, when interrogating a "class" attribute you have to use the literal of "className". Therefore you should be doing something like this:-

VB.NET:
If WebBrowser1.ReadyState = WebBrowserReadyState.Complete Then
  For Each myPElement As HtmlElement In WebBrowser1.Document.GetElementsByTagName("p").Cast(Of HtmlElement).Where(Function(x) x.GetAttribute("className") = "std-address").Skip(1)
    For Each addElement As HtmlElement In myPElement.GetElementsByTagName("span")
      MsgBox(addElement.InnerText)
    Next
  Next
End If

One additional point to note is that you have two addresses in the HTML file which satisfy the above search criteria so I have added the additional LINQ statement Skip(1) to skip the first address and only access the second address since this seems to be the one that has everything you are looking for.

Hope that helps.

Cheers,

Ian
 
But, how do I get the Class name?

In <span class="address1 range">22805 60TH AVE W</span> how would I extract address1 range

VB.NET:
Private Sub WebBrowser1_DocumentCompleted(ByVal sender As System.Object, ByVal e As System.Windows.Forms.WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted
        Dim iLoop As Integer = 0
        Dim sRawKeyData As String = ""


        If InStr(WebBrowser1.DocumentText, "Unfortunately, this address wasn't found.") > 0 Then
            txtSiteKey.Text = "Address can not be validated"
        Else
            TextBox1.Text = WebBrowser1.DocumentText
            If WebBrowser1.ReadyState = WebBrowserReadyState.Complete Then
                For Each myPElement As HtmlElement In WebBrowser1.Document.GetElementsByTagName("p").Cast(Of HtmlElement).Where(Function(x) x.GetAttribute("className") = "std-address").Skip(1)
                    For Each addElement As HtmlElement In myPElement.GetElementsByTagName("span")
                        iLoop += 1
                        Select Case iLoop
                            Case 1
                                txtSiteAddress.Text = (addElement.InnerText)
                                'case for address2
                            Case 2
                                txtSiteCity.Text = (addElement.InnerText)
                            Case 3
                                cboSiteState.Text = (addElement.InnerText)
                            Case 4
                                txtSiteZip.Text = (addElement.InnerText)
                            Case 5
                                txtSiteZip.Text += (addElement.InnerText)
                            Case 6
                                txtSiteZip.Text += (addElement.InnerText)
                        End Select
                        sRawKeyData += Replace((addElement.InnerText), " ", "")
                    Next
                    For Each addElement As HtmlElement In myPElement.GetElementsByTagName("dd")
                        MsgBox(addElement.InnerText)
                    Next
                Next
            End If
            txtSiteKey.Text = sRawKeyData
        End If
    End Sub
 
Last edited:
Hi,

I am surprised that you had to ask that question since I have already demonstrated this, but to answer the question, try:-

VB.NET:
addElement.GetAttribute("className")

Hope that helps.

Cheers,

Ian
 
Hi,

Sorry, but I am not going to tell you this one this time since it seems you are not leaning from what has already been posted. After a quick look at the HTML file you do not need anything else to fix this, you just need to consider what tags and attributes to look for.

Cheers,

Ian
 

Latest posts

Back
Top