Retrieving actual HTML source using WebBrowser component

mithrandiir42

Member
Joined
Mar 7, 2006
Messages
16
Programming Experience
3-5
This one seems so easy but it's turning out to be very difficult.. I have pages loading in a WebBrowser object and i'd like to be able to get the HTML source (just as you would with the View Source option in IE). I thought i had it with browser.Document.Body.innerHTML but that doesn't give you the exact source as 'View Source' does. Being able to get the exact source with no alterations to the formatting is critical for my app. I would use WebRequest or WebClient and just download the raw data there but unfortunately i'm using WebBrowser to navigate a site & log in before i get to the page i need to retrieve.. if anyone know of a way i could get the raw html source of a page using the WebBrowser control i'd really appreciate the help.

Thanks!
 
VB.NET:
[/SIZE]
[SIZE=2]Dim fullsource As String = WebBrowser1.DocumentText[/SIZE]
[SIZE=2]
 
JohnH said:
VB.NET:
[/SIZE]
[SIZE=2]Dim fullsource As String = WebBrowser1.DocumentText[/SIZE]
[SIZE=2]


Thanks for the reply. that looks like it should help but now my problem is I've been using SHDocVw.WebBrowser and not the System.Windows.Forms.WebBrowser that has the DocumentText property... Do you know of any way i can cast my SHDocVw WebBrowser to the System.Windows.Forms one? This is inside a DocumentComplete event handler that looks like this "void DocComplete(object pDisp, ref object URL)" so i also have pDisp available if that helps at all. I've tried a few different ways of casting the webbrowser object but I get getting Invalid Cast exceptions.

Thanks again for all your help.
 
Using SHDocVw.dll, also add MSHTML.dll for working with the document (DOM), then this code:
VB.NET:
Private Sub WebBrowser1_DocumentComplete(ByVal sender As Object, ByVal e As AxSHDocVw.DWebBrowserEvents2_DocumentCompleteEvent) _
Handles WebBrowser1.DocumentComplete
  If e.pDisp Is WebBrowser1.Application Then
    Dim objDoc As mshtml.HTMLDocument
    objDoc = WebBrowser1.Document
    MsgBox(objDoc.documentElement.outerHTML)
  End If
End Sub
 
Thanks for the response. unfortunately this doesn't preserve tabs or carriage returns in the original HTML. When i do a view source in internet explorer, it will open up the html with the original formatting but when when i use documentElement.outerHTML all of this formattings seems to be stripped out.. Do you have any other ideas on how to retrieve this?
 
Back
Top