regular expressions extract a url

doodyhead

New member
Joined
Sep 15, 2009
Messages
1
Programming Experience
Beginner
Hi

I am trying to extract a url from a html page that is located specifically between the <h2 </h2> header. For example the following header is in the html page;

VB.NET:
<h2 style="display: block"><a 
href="http://mirln.blogspot.com/">MIRLN</a></h2>

I would like a regular expression that parses only the url from this specific header. The code I have does not work. any suggestions would be appreciated.

VB.NET:
Dim regex As Regex = New Regex( _
                            "(^<h2 style=""display: block""><a href=""(?<Link>.*?)</a></h2>)", _
                            RegexOptions.IgnoreCase _
                            Or RegexOptions.CultureInvariant _
                            Or RegexOptions.IgnorePatternWhitespace _
                            Or RegexOptions.Compiled _
                            )

            Dim ms As MatchCollection = regex.Matches(_html)
            Dim url As String = String.Empty

            For Each m As Match In ms
                url = m.Groups("Link").Value
                If Not String.IsNullOrEmpty(url) Then
                    url = fixurl(fromUrl, url)
                    'decode the url
                    url = url.Replace("&", "&")
                    If Not urls.Contains(url) Then
                        urls.Add(url)
                    End If
                End If
            Next
 
Back
Top