Hi
I am trying to extract a url from a html page that is located specifically between the <h2 </h2> header. For example the following header is in the html page;
I would like a regular expression that parses only the url from this specific header. The code I have does not work. any suggestions would be appreciated.
I am trying to extract a url from a html page that is located specifically between the <h2 </h2> header. For example the following header is in the html page;
VB.NET:
<h2 style="display: block"><a
href="http://mirln.blogspot.com/">MIRLN</a></h2>
I would like a regular expression that parses only the url from this specific header. The code I have does not work. any suggestions would be appreciated.
VB.NET:
Dim regex As Regex = New Regex( _
"(^<h2 style=""display: block""><a href=""(?<Link>.*?)</a></h2>)", _
RegexOptions.IgnoreCase _
Or RegexOptions.CultureInvariant _
Or RegexOptions.IgnorePatternWhitespace _
Or RegexOptions.Compiled _
)
Dim ms As MatchCollection = regex.Matches(_html)
Dim url As String = String.Empty
For Each m As Match In ms
url = m.Groups("Link").Value
If Not String.IsNullOrEmpty(url) Then
url = fixurl(fromUrl, url)
'decode the url
url = url.Replace("&", "&")
If Not urls.Contains(url) Then
urls.Add(url)
End If
End If
Next