Hey all,
I've written a few simple HTML parsers which scrape data from a URL...or a list of URL's. Getting the data from the HTML is no issue, but now I'm trying to scrape data from a site which requires login before accessing the data. I've tried a couple of things, but no luck. I've done some research, and I'm not sure if I'm finding the right information:
c# - How to programmatically log in to a website to screenscape? - Stack Overflow
Use HttpWebRequest to simulate hotmail login
This is in C#, which is hardly different from VB.NET, but I need some guidance on how to, without using a WebBrowser, get logged into a website for data collection.
I've been able to log in by using Document.GetelementbyId()... and filling in the inputs and sending a click event, but that requires a WebBrowser, which not only slows things down considerably, it makes my program more complex than it needs to be.
Ideally, I could ping the URL for the login page, send the login data, and capture the cookies so that I can now get into the members-only area to collect data.
Here's the code I'm using to retrieve the HTML of any given URL:
Thanks!
I've written a few simple HTML parsers which scrape data from a URL...or a list of URL's. Getting the data from the HTML is no issue, but now I'm trying to scrape data from a site which requires login before accessing the data. I've tried a couple of things, but no luck. I've done some research, and I'm not sure if I'm finding the right information:
c# - How to programmatically log in to a website to screenscape? - Stack Overflow
Use HttpWebRequest to simulate hotmail login
This is in C#, which is hardly different from VB.NET, but I need some guidance on how to, without using a WebBrowser, get logged into a website for data collection.
I've been able to log in by using Document.GetelementbyId()... and filling in the inputs and sending a click event, but that requires a WebBrowser, which not only slows things down considerably, it makes my program more complex than it needs to be.
Ideally, I could ping the URL for the login page, send the login data, and capture the cookies so that I can now get into the members-only area to collect data.
Here's the code I'm using to retrieve the HTML of any given URL:
VB.NET:
Function ReadPage(ByVal url As String) As String
Dim i As Integer = 0
'try 7 times to retrieve the page before declaring un-retrievable
While i < 7
Try
'code which actually retrieves the HTML
Dim objstreamreader As New StreamReader(HttpWebRequest.Create(url).GetResponse.GetResponseStream())
Dim strPage As String = objstreamreader.ReadToEnd
objstreamreader.Close()
Return strPage
Catch
i += 1
End Try
End While
'the function returns '-1' as a string if it could not retrieve the web page in 7 tries
Return "-1"
End Function
Thanks!