Question Pls help! Multiple (http)Webrequests

xGhost

Member
Joined
Feb 8, 2010
Messages
21
Programming Experience
1-3
I have done quite some debugging on this one.
What I want to achieve:
1) I log in to a website & store the cookies in a container
2)With that cookie I need to request many pages and parse them.
3)Make the above efficiënt & fast.

Problem:
When I do 1 request at a time, it would take ages to request & parse x amount of pages.

So what I wanted/want:
Multiple httpwebrequests at thesame time (or executing partially serial/parallel). I know with sockets you can create an array and fire many requests at thesame time (and fastly), but I'm not using sockets.

So my first thought was:
Create multiple threads with a request with ThreadPool.QueueUserWorkItem
-> I got a lot of errors, and half of the time I saw in the html that I wasn't logged in
-> changed my (sequential) requests into asynchronious ones
-> many errors dissapeared (it happens once a while that I'm not logged in).

Conlusion:
My results are 50% more fast (then running 1 request at a time in a normal loop sequentially), here I began to think that only 2 requests at thesame time actually are executed even there are more than 1 threads started

My second thought:
Hey lets try the .net 4.0 parallel.for
-> Not many threads are actually started
-> gives thesame performance as using the threadpooling.

My conclusions are:
-> I think that just 2 requests are executed/started at thesame time, 1 on cpucore1, another on cpucore2

This is nothing like the performance of sockets even on 1 core.
So my question is how can I request and parse multiple pages at thesame time (simultaniously or mixed parallel/sequential with results like an array of sockets give). This cant be real that only 2 parallel or simultaniously connections can be made (as a request).

My structure on this is now:
login
parrallelloop(execute a method in the parallelloop)


method:
VB.NET:
        Try
            Dim httpStateRequest As HttpState = New HttpState

            httpStateRequest.httpRequest = WebRequest.Create(url)
            httpStateRequest.httpRequest.CookieContainer = cookies
            httpStateRequest.httpRequest.KeepAlive = False

            ' Get the response object
            Dim ar As IAsyncResult
            ar = httpStateRequest.httpRequest.BeginGetResponse(AddressOf HttpResponseCallback, httpStateRequest)
        Catch wex As WebException
            Console.WriteLine("Exception occurred on request: {0}", wex.Message)
        End Try
VB.NET:
    Private Sub HttpResponseCallback(ByVal ar As IAsyncResult)
        running = running & ("running & I'm busy with the request on: " & Date.Now.ToString & " in thread: " & Thread.CurrentThread.ManagedThreadId) & vbCrLf
        Try
            Dim httpRequestState As HttpState = ar.AsyncState

            ' Complete the asynchronous request
            httpRequestState.httpResponse = httpRequestState.httpRequest.EndGetResponse(ar)
            ' Read the response into a Stream object.
            'Dim httpResponseStream As Stream = httpRequestState.httpResponse.GetResponseStream()
            Dim httpResponseStreamReader As New StreamReader(httpRequestState.httpResponse.GetResponseStream())
            Dim result2 = httpResponseStreamReader.ReadToEnd.Trim
            ' Post asynchronous Read operations on stream
            httpResponseStreamReader.Close()
            Dim doc2 As New HtmlAgilityPack.HtmlDocument()
            doc2.LoadHtml(result2)
            rootNode = doc2.DocumentNode

            'do something with the result
            Return
        Catch ex As Exception
            Console.WriteLine("Exception: {0}", ex.Message)
        End Try
    End Sub

Parallelloop:
VB.NET:
Parallel.For(begin, iterations, Sub(i)
                                            counter += 1
                                            strThreads = strThreads & "I'm: " & counter & " and in thread: " & Thread.CurrentThread.ManagedThreadId & vbCrLf     
                                      executeAmethodWhichDoesAnAsyncRequest(params come here)
                                        End Sub)

A screenshot about the threads (starting/time):
2eki5qf.jpg

What you see at the first multiline textbox are threads whom are possibly put in a waiting state (since for example I'm 221 and in thread: 6 refer to I'm page 221 to be requested in thread 6)
Btw I have a dualcore 2.8 ghz & broadband connection 15+mbps
 
Last edited:
I'm getting between 0.75~0.85 sec/page (2-4 connections) on a slow connection and about 0.5~0.6 seconds/page (for connecting + parsing) on a fast connection. These are good speeds. Many thanks again for the help. Learned alot. This can be put as answered :).
 
Back
Top