remove vbcrlfs from innerhtml

Ultrawhack

Well-known member
Joined
Jul 5, 2006
Messages
164
Location
Canada
Programming Experience
3-5
Hi,

I trap the innerhtml from a webbrowser. How can I remove the vbcrlfs from the innerhtml ? I cannot use replace as hdoc is not a string.

Thanks !

VB.NET:
[SIZE=2][COLOR=#0000ff]Dim[/COLOR][/SIZE][SIZE=2] hDoc [/SIZE][SIZE=2][COLOR=#0000ff]As[/COLOR][/SIZE][SIZE=2] HtmlDocument
[/SIZE][SIZE=2]hDoc = WebBrowser1.Document.Window.Frames([/SIZE][SIZE=2][COLOR=#800000]"contentframe"[/COLOR][/SIZE][SIZE=2]).Document[/SIZE]
[SIZE=2][SIZE=2]FileOpen(1, [/SIZE][SIZE=2][COLOR=#800000]"c:\testfile.htm"[/COLOR][/SIZE][SIZE=2], OpenMode.Binary)
FilePut(1, hDoc.Body.InnerHTML)
FileClose(1)
[/SIZE][/SIZE]
 
InnerHTML is of type String, go ahead and replace and string manipulate at will.

A few comments, why do you open file for binary when you are writing text? FileOpen is only provided for backward compatibility, there exist numerous other options for file IO in .Net, the one used in example below is automatically inserted code when you use context menu in code editor and click & browse the 'insert snippet'.

I see you named file 'testfile.htm' implying that there would be a complete html document to be opened and viewed in a browser. Body.InnerHtml will not get you the complete html document, at better will you should tried OuterHtml which includes the surrounding html tag, but WebBrowser.DocumentText returns everything.

Also I can't help being a bit curious to why you want to remove the vbNewlines.. :)
VB.NET:
Dim bd As String = WebBrowser1.DocumentText
bd = bd.Replace(vbNewLine, "")
Dim filepath As String = Application.StartupPath & "\html.htm"
My.Computer.FileSystem.WriteAllText(filepath, bd, False)
 
Thanks. I'm access/vba background transitioning to this vb.net express stuff but learning quick and quite enjoying it.

To clear up a few things:
This whole exercise is a screenscraper where I need to trap elements from the html and store it in my db.
I know you will ask, why not just use innertext. Because it's easier to grab the elements I need using the pre & post html as delimiters. innertext gives me a lump of text and there is no way I can trap the contents correctly.

FileOpen method? My mistake. Now I know better...

Why innertext. I need to scrape the frame contents html only.

Why save as htm ? Easier for me to use html editor and find colored html tag delimiters I need. Why replace newlines? Because I get a longer HTML string to trap elements will less% of error.

This works great. Thanks !
VB.NET:
       Dim hDoc As HtmlDocument

        hDoc =WebBrowser1.Document.Window.Frames("contentframe").Document
          
Dim bd As String = hDoc.Body.InnerHtml
        bd = bd.Replace(vbNewLine, "")
Dim filepath As String = Application.StartupPath & "\html.htm"
My.Computer.FileSystem.WriteAllText(filepath, bd, False)
 
That you have to ask a C# forum, here at VB.Net forums we don't have any C# forums. You could also try any of the online converters to see if it may help you. Also SharpDevelop IDE have a project converter that can be used between VB.Net and C#.
 
Back
Top