Question Auto-login to website for screen scraping/data collection?

dmoney

New member
Joined
Jan 5, 2012
Messages
1
Programming Experience
1-3
Hey all,

I've written a few simple HTML parsers which scrape data from a URL...or a list of URL's. Getting the data from the HTML is no issue, but now I'm trying to scrape data from a site which requires login before accessing the data. I've tried a couple of things, but no luck. I've done some research, and I'm not sure if I'm finding the right information:

c# - How to programmatically log in to a website to screenscape? - Stack Overflow
Use HttpWebRequest to simulate hotmail login

This is in C#, which is hardly different from VB.NET, but I need some guidance on how to, without using a WebBrowser, get logged into a website for data collection.

I've been able to log in by using Document.GetelementbyId()... and filling in the inputs and sending a click event, but that requires a WebBrowser, which not only slows things down considerably, it makes my program more complex than it needs to be.

Ideally, I could ping the URL for the login page, send the login data, and capture the cookies so that I can now get into the members-only area to collect data.

Here's the code I'm using to retrieve the HTML of any given URL:

VB.NET:
 Function ReadPage(ByVal url As String) As String
        Dim i As Integer = 0
        'try 7 times to retrieve the page before declaring un-retrievable
        While i < 7
            Try
                'code which actually retrieves the HTML
                Dim objstreamreader As New StreamReader(HttpWebRequest.Create(url).GetResponse.GetResponseStream())
                Dim strPage As String = objstreamreader.ReadToEnd
                objstreamreader.Close()
                Return strPage
            Catch
                i += 1
            End Try
        End While
        'the function returns '-1' as a string if it could not retrieve the web page in 7 tries
        Return "-1"
    End Function

Thanks!
 
Back
Top