Loading webpage to string & verifying in richtextbox

xtanmanx

New member
Joined
Aug 10, 2012
Messages
1
Programming Experience
Beginner
I tried searching, but could not find anything...possibly searching for wrong terms. Please move if this does not belong in this forum.

I am using Dim myText As String = Me.WebBrowser1.Document.Body.InnerText to load the current page of webbrowser1 to a string to manipulate.

The page has paragraph text as well as text within a table. This is a quick example of what I'm testing with:

Blah blah blah more blah an even more blah, the blah blah blah blah, the blah blah blah blah
first namelast namedate of birth
NobodyInparticular01/01/1900
AnotherNamehere10/10/2000

When it loads to string and I put it in a rich text box with richtextbox1.text = myText it appears like this:

Blahblahblahmoreblahanevenmoreblah,theblahblahblahblah,theblahblahblahblah
firstnamelastnamedateofbirth
NobodyInparticular01/01/1900
AnotherNamehere10/10/2000,
(not sure why the space shows up...i've edited after preview, even changed the text, still a space. maybe because using chrome?)

But when I use InStr extract data, it actually counts the spaces as if they were there. So the first "/" shows up at 131 (took me a minute to realize the vbCrLf is also counted as a character).

So I am able to extract the data I need, but I would like to know why this is in case I run into issues with this in the future.

TIA
Edit: Just looked one more time using msgbox and see that paragraph text actually has spaces, but the table text does not. So am I correct in assuming that the table is using a character placeholder like the vbCrLf?
 
Last edited:
Look again! I'd be very surprised if firstnamelastnamedateofbirth was not in fact

first namelast namedate of birth

which is the inner text of

<td>first name</td><td>last name</td><td>date of birth</td>
 
On dealing with HTML content, working with the .InnerText string is rarely enough. As Dunfiddlin cleverly showed, .InnerText does no effort to parse text spread within several HTML tags. One should sooner or later handle WebBrowser's DocumentCompleted events and use HtmlDocument and HtmlElement classes' methods and properties to order and filter this content. These classes provide easy access to parent-children relation between the elements of HTML. You'd rather instantiate the table as an HtmlElement and iterate through its rows and cells to get what you need.
 
Back
Top