Question need help building regex (regular expression)

andyvl

Member
Joined
May 14, 2009
Messages
14
Programming Experience
Beginner
ok, For the moment I have a string() where each element is in the structure of

" class="odd"><td class="hs"><a href="http://resume.imdb.com/" onClick="(new Image()).src='/rg/title-tease/resumehead/images/b.gif?link=http://resume.imdb.com/';"><img src="http://i.media-imdb.com/images/tn15/addtiny.gif" width="25" height="31" border="0"></td><td class="nm"><a href="/name/nm0926013/">Leonard Whiting</a></td><td class="ddd"> ... </td><td class="char"><a href="/character/ch0000491/">Romeo</a></td></tr>"


For this example I need to get the values
Leonard Whiting
and
Romeo

Is this doable via reg. expr?
please note that the a href="\name\nm0926013 and the a href="\character\ch0000491
are no consant values.

Which is the best way to get this data out of this string?
I'm playing with the regex for about 4hours now and I just can't get it.
Any help will be more then welcome
 
You'll need to do some string cleanup to take care of quotes in your input string. This seems to work for me.

VB.NET:
		Dim content As String = "<a href=""/name/nm0926013/"">Leonard Whiting</a></td><td class=""ddd""> ..." & _
		 "</td><td class=""char""><a href=""/character/ch0000491/"">Romeo</a></td></tr>"

		Dim match As New Regex("<a[^>]*>(?'tagtext'.*?)</a>")

		For Each m As Match In match.Matches(content)
			MessageBox.Show(m.Groups("tagtext").Value)
		Next
 
You'll need to do some string cleanup to take care of quotes in your input string. This seems to work for me.

VB.NET:
		Dim content As String = "<a href=""/name/nm0926013/"">Leonard Whiting</a></td><td class=""ddd""> ..." & _
		 "</td><td class=""char""><a href=""/character/ch0000491/"">Romeo</a></td></tr>"

		Dim match As New Regex("<a[^>]*>(?'tagtext'.*?)</a>")

		For Each m As Match In match.Matches(content)
			MessageBox.Show(m.Groups("tagtext").Value)
		Next

The only two things I'd be mindful of in this case are that the strings in a tag attributes might contain a >, and that I wouldnt use the greedy search because it works back from the end of a string trying to match as much sa possible. A pessimistic search for the end of the tag may perform better.
 
Back
Top