Read HTML Page Source to a String

asn1981

Active member
Joined
Mar 15, 2005
Messages
38
Programming Experience
Beginner
Hi
this might be a fairly basic question but how would i read a .html pages (that is on the users computer) source code into some kind of string variable?
thanks in advance
 
sevenhalo said:
is the html file on the local computer, or are you trying to scrape a site.

hi
no the html files are all on my computer but i need to compile certain statistics about the code in question
 
If you just want to open it and read it into a string, this'll do it for you (assuming that the html file is on the computer running the app):

VB.NET:
Dim strPath As String = "Full Path To File"
Dim strFile As System.IO.File
Dim strmRead As New System.IO.StreamReader(strFile.OpenRead(strPath))
Dim htmldata As String
htmldata = strmRead.ReadToEnd

But it's a sloppy way of doing it. Not to mention, since you're not flushing the buffer; you'll deffinitely see a performance hit on larger files.
 
sevenhalo said:
If you just want to open it and read it into a string, this'll do it for you (assuming that the html file is on the computer running the app):

VB.NET:
Dim strPath As String = "Full Path To File"
Dim strFile As System.IO.File
Dim strmRead As New System.IO.StreamReader(strFile.OpenRead(strPath))
Dim htmldata As String
htmldata = strmRead.ReadToEnd

But it's a sloppy way of doing it. Not to mention, since you're not flushing the buffer; you'll deffinitely see a performance hit on larger files.

hi
that is the problem that i have
some of the files are very large
n the way the program will be working is like a spider therefore efficiency is key
would you have any further suggestions?
 
VB.NET:
Dim strPath As String = "Path to file even, so much for being anonymous :)"
Dim strFile As System.IO.File
Dim strmRead As New System.IO.StreamReader(strFile.OpenRead(strPath))
Dim arr() As String
Dim cnt As Long
While strmRead.Peek > 0
ReDim Preserve arr(cnt)
arr(cnt) = strmRead.ReadLine
cnt += 1
If cnt Mod 50 = 0 Then
strmRead.BaseStream.Flush()
End If
End While
 
My code slowed it down! :eek:

On second thought, ignore flushing, use this:

VB.NET:
Dim strPath As String = "FilePath"
Dim strFile As System.IO.File
Dim strmRead As New System.IO.StreamReader(strFile.OpenRead(strPath))
Dim cnt As Long = 0
Dim arr(100) As String
While strmRead.Peek > 0
If (cnt Mod 100 = 0) And cnt > 0 Then
ReDim Preserve arr(cnt + 100)
End If
arr(cnt) = strmRead.ReadLine
cnt += 1
End While
ReDim preserve arr(cnt)

I tested this against an html file with 400000 lines, took <30 seconds.
 
Back
Top