Question String Manipulation

josephjohn100

Member
Joined
Nov 8, 2012
Messages
8
Programming Experience
3-5
I have a string . I would like to get the Caption alone from this.
strLink = "<a href = /claims/claims.aspx?seq=124= >My Caption</a>" ' need to get "My Caption"
 
I think this is a job for regular expressions which I have no direct knowledge of..

Here is how I do something like that at the moment though..

If the format will always be similar to your example:

Dim strLink As String = "<a href = /claims/claims.aspx?seq=124= >My Caption</a>"

Dim strStart As Integer = strLink.IndexOf(">") + 1
Dim strFinish As Integer = strLink.LastIndexOf("<")
Dim strLength As Integer = strFinish - strStart


Dim myString As String = strLink.Substring(strStart, strLength)

Hope it helps
 
Last edited:
22-degrees said:
strLink.Substring(Start, Finish)
'Finish'? Is that a good word for argument given for length parameter, the number of chars to get?
 
JohnH I don't understand your concern..

In the case of .substring I don't see the variable name being an issue.. As a rule in my own coding, i use such names as p1, p2, p3 for this kind of thing as they are only temporary.

If you are saying it is a requirement on this forum to reflect the code being used in the name then I will try to comply in the future.
 
When one reads your code one could get confused to think 'Finish' means 'end index', which is not what that parameter do. 'Start' signals the 'start index' and that is what the first parameter do.
 
My apologies. I was writing under the assumption that something as simple as substring could not be mis-interpreted. I am not used to sharing code but I will try to be more aware in the future.

I will edit my original post.

Thanks.
 
I was writing under the assumption that something as simple as substring could not be mis-interpreted.
It is not uncommon, there have been many threads here with that misconception present. Maybe due to the Javascript substring function working like that? Who knows.
 
In the case of .substring I don't see the variable name being an issue.. As a rule in my own coding, i use such names as p1, p2, p3 for this kind of thing as they are only temporary.

Let me ask you this, when you talk to someone, even in the most slang way, do you say "Hey what is the p3 of this plank?". The reason you don't is to make it clear and precise exactly what you mean. The same applies to code, naming variables with nonsensical names is a very bad practice.

One example with the current exercise, if you call the length finish, how would someone then deduct how to construct an expression to get characters 3 to 5 out of a string? The "length" way: Substring(3, 2) . The "finish" way: Substring(3, 5). I rest my case.
 
As has been pointed out Herman, I'm not used to sharing code but in my defense, why is everyone taking this out of context?

I made the declarations and used simple functions (indexof and lastindexof). If I had known i was going to have to idiot-proof something as basic as that I might have thought differently but instead, I figured it was safe to use start (point) and finish (point) because it's ONLY substring.. a very simple concept.. I assumed the OP had some level of intelligence if he is manipulating HTML through vb..

Yes, in hindsight, I should have come up with a less 'confusing' word but the matter had already been discussed and clarified so why did you feel you had to chime in and repeat what had already been expressed?
 
Another option I would use, assuming your source data is an HTML document, is to handle it with HtmlDocument and HtmlElement classes, that would allow you to use .GetElementsByTagName in order to iterate all "<a>" elements and get whatever properties of them you want to, especially .InnerText, that is your main concern. I'm far from my Workstation which has VS IDE so I can't give you the proper code now, but I'll get back to it.
 
So, the option I suggested involves some more complexity than I thought at first moment.

I discovered HtmlDocument class has no constructor, so you must get is as a .Document property from a WebBrowser control which has already navigated to the website which you want to read from. You'll need a Windows Form, a WebBrowser control on it, and then command it to load the page using the .Navigate method, and handle its .DocumentCompleted event to know when it's completely loaded.

Perhaps it is still worth the try, for HtmlDocument and HtmlElement classes give you wide access to document structure and content, and allow also to automate navigation.

I tried also a different aproach, using XmlDocument, for I thought, after all, HTML is no more than a XML implementation, and XmlDocument has a similar .GetElementsByTagName property:

    Friend Shared Function Teste() As Object
        Dim SourceCode As String = "<html><body><a href = '/claims/claims.aspx?seq=124=' >My Caption</a></body></html>"
        Dim XmlDoc As New Xml.XmlDocument
        XmlDoc.LoadXml(SourceCode)
        Return XmlDoc.GetElementsByTagName("a").Item(0).InnerText
    End Function


You'll notice, however, that it only works if attributes like href have their values between quotes. It seems XmlDocument class is very strict about it, while WebBrowsers can render the code in a more relaxed way.

And, of course, using RegularExpressions, as first suggested, will do it fine, too, and maybe it is the most straightforward way.

Hope these considerations help you.
 
I discovered HtmlDocument class has no constructor, so you must get is as a .Document property from a WebBrowser control which has already navigated to the website which you want to read from. You'll need a Windows Form, a WebBrowser control on it, and then command it to load the page using the .Navigate method, and handle its .DocumentCompleted event to know when it's completely loaded.
Or, as a utility if you're not already using a WebBrowser, just create one and set DocumentText = String.Empty and use Document.Write to make it parse the string.
        Using web As New WebBrowser With {.DocumentText = String.Empty}
            web.Document.Write("<a href = /claims/claims.aspx?seq=124= >My Caption</a>")
            Dim s = web.Document.GetElementsByTagName("a")(0).InnerText
        End Using
 
Back
Top