Reading HTML and performing Regex

paulthepaddy

Well-known member
Joined
Apr 9, 2011
Messages
222
Location
UK
Programming Experience
Beginner
Hi guys.

Im having problems with trying to read some HTML and get the data I need.
VB.NET:
Public Sub GetItems(ByVal CharName As String, ByVal RealmName As String)
        Dim Request As System.Net.HttpWebRequest = System.Net.HttpWebRequest.Create("http://eu.battle.net/wow/en/character/" & RealmName & "/" & CharName & "/feed")
        Dim Response As System.Net.HttpWebResponse = Request.GetResponse
        Dim sr As System.IO.StreamReader = New System.IO.StreamReader(Response.GetResponseStream())
        Dim Sourecode As String = sr.ReadToEnd

        Dim Item_ As New System.Text.RegularExpressions.Regex( _
        "Obtained <a href=""/wow/en/item/.*"" class=""color-q4"".*")
        Dim matche_name As MatchCollection = Item_.Matches(sourecode)
        For Each Match As Match In matche_name
            Dim ItemID As String
            Dim ItemSource As String

            Select Case True
                Case Match.Value.Contains("cc=0")
                    Continue For
                Case Match.Value.Contains("cc=3")
                    ItemSource = "raid-normal"
                Case Match.Value.Contains("cc=4")
                    ItemSource = "raid-finder"
                Case Match.Value.Contains("cc=5")
                    ItemSource = "raid-heroic"
                Case Match.Value.Contains("cc=6")
                    ItemSource = "raid-mythic"
                Case Else
                    Continue For
            End Select

            'Splits to Get ID
            Dim ID_Match As String = Match.Value.Split("/").GetValue(4)
            ItemID = ID_Match.Split("""").GetValue(0)
            LB_Items.Items.Add(explorer.GetItem(ItemID, ItemSource))
            LB_Items.Refresh()
        Next

    End Sub

And here the HTML block im trying to get
HTML:
<li>
    <dl>
        <dd>

        <a href="/wow/en/item/113961" class="color-q4" data-item="d=65&pl=100&cc=5&bl=566">




        <span  class="icon-frame frame-18 " style='background-image: url("http://media.blizzard.com/wow/icons/18/inv_plate_raidwarrior_o_01boots.jpg");'>
        </span>
</a>

    Obtained <a href="/wow/en/item/113961" class="color-q4" data-item="d=65&pl=100&cc=5&bl=566">Iron Bellow Sabatons</a>.
</dd>
        <dt>1 day ago</dt>
    </dl>
  </li>

I have gotten all the information I need from the 1 line but now I need to get that 1 day ago to calc a date i cant run another regex as their is far more <dt> tags and i dont know how i would couple the date to the item.

I have linked a full 'node' as to say, as im assuming im going to have to read the full block but i really dont know how i would attempt this, i have done a fair bit of googling and not finding the info helpfull as non of it makes sense to me.

I will keep reading into it but it think it will take me a very long time to get it so thought i put up a post.

Thanks guys.
Any questions just ask
 
Thing i should ask the question diffrently , from a full page of HTML how can i get the regex to match the full block above, when i said i cant run another regex i meant i cant just search for the <dt>.*</dt>. as you can probably tell from my code i have been searching for this line 'Obtained <a href="/wow/en/item/113961" class="color-q4" data-item="d=65&pl=100&cc=5&bl=566">Iron Bellow Sabatons</a>.' If someone can tell me how to get everything between <li> & </li> im sure i can figure out how to split the info i need.

Thanks
 
Back
Top