Get All Occurrences Between Two Strings

PutterPlace

Active member
Joined
Feb 18, 2008
Messages
37
Programming Experience
1-3
I am working on, what should be, a simple project. What I need to do it collect all occurences of a string between two specific strings. For example:

VB.NET:
These are some words that may be here.<br><br>
<br><a href='/directory/?act=SomeText'>Some Name</a>
<br><br> These are some more words that might be here.<br><br>
<br><a href='/directory/?act=SomeMoreText'>Some Other Name</a>

I need to get "SomeText" and "SomeMoreText" (without the quotes) from the above text. I was thinking maybe using regular expressions to do the task, but I'm not sure how to put it together. This text will always remain the same:

Text Before String:
VB.NET:
<br><a href='/directory/?act=

Text Immediately After String:
VB.NET:
'>



Any help with this matter would be GREATLY appreciated.
 
It may not be the way you wanted (ie no regex), but I'd probably write it like this

VB.NET:
        Dim OriginalString As String = ".... here.<br><br><br><a href='/directory/?act=SomeText'>Some Name</a><br><br>..... here.<br><br><br><a href='/directory/?act=SomeMoreText'>Some Other Name</a>"
        Dim StartMatch As String = "href='/directory/?act="
        Dim EndMatch As String = "'>"

        Dim TheMatches As New List(Of String)

        While OriginalString.IndexOf(StartMatch) >= 0
            OriginalString = OriginalString.Substring(OriginalString.IndexOf(StartMatch) + StartMatch.Length)
            TheMatches.Add(OriginalString.Substring(0, OriginalString.IndexOf(EndMatch)))
        End While

        For Each _Match As String In TheMatches
            MessageBox.Show(_Match)
        Next _Match
 
Since you're working with Html which is a tree structured set of tag nodes you can load it into a webbrowser and use the Document tree to get the info fast. An example:
VB.NET:
Dim occurences As New List(Of String)
For Each anchor As HtmlElement In webbrowser.Document.GetElementsByTagName("a")
    Dim href As String = anchor.GetAttribute("href")
    occurences.Add(href.Substring(href.IndexOf("?act=") + 5))
Next
 
Here's a regex with capturing groups that will hold the info you need:

Regex r = new Regex("'/directory/[?]act=(.*?)'>")

THe Matches collection of that will contain 2 items, each match will have a Groups collection, the second item of which (index 1) will be the demanded text

Dim mc as MatchCollection = r.Matches(YOUR_INPUT_STRING)
For Each m as Match in mc
m.Groups(1).Value 'what you want
 
Back
Top