I have done quite some debugging on this one.
What I want to achieve:
1) I log in to a website & store the cookies in a container
2)With that cookie I need to request many pages and parse them.
3)Make the above efficiënt & fast.
Problem:
When I do 1 request at a time, it would take ages to request & parse x amount of pages.
So what I wanted/want:
Multiple httpwebrequests at thesame time (or executing partially serial/parallel). I know with sockets you can create an array and fire many requests at thesame time (and fastly), but I'm not using sockets.
So my first thought was:
Create multiple threads with a request with ThreadPool.QueueUserWorkItem
-> I got a lot of errors, and half of the time I saw in the html that I wasn't logged in
-> changed my (sequential) requests into asynchronious ones
-> many errors dissapeared (it happens once a while that I'm not logged in).
Conlusion:
My results are 50% more fast (then running 1 request at a time in a normal loop sequentially), here I began to think that only 2 requests at thesame time actually are executed even there are more than 1 threads started
My second thought:
Hey lets try the .net 4.0 parallel.for
-> Not many threads are actually started
-> gives thesame performance as using the threadpooling.
My conclusions are:
-> I think that just 2 requests are executed/started at thesame time, 1 on cpucore1, another on cpucore2
This is nothing like the performance of sockets even on 1 core.
So my question is how can I request and parse multiple pages at thesame time (simultaniously or mixed parallel/sequential with results like an array of sockets give). This cant be real that only 2 parallel or simultaniously connections can be made (as a request).
My structure on this is now:
login
parrallelloop(execute a method in the parallelloop)
method:
Parallelloop:
A screenshot about the threads (starting/time):
What you see at the first multiline textbox are threads whom are possibly put in a waiting state (since for example I'm 221 and in thread: 6 refer to I'm page 221 to be requested in thread 6)
Btw I have a dualcore 2.8 ghz & broadband connection 15+mbps
What I want to achieve:
1) I log in to a website & store the cookies in a container
2)With that cookie I need to request many pages and parse them.
3)Make the above efficiënt & fast.
Problem:
When I do 1 request at a time, it would take ages to request & parse x amount of pages.
So what I wanted/want:
Multiple httpwebrequests at thesame time (or executing partially serial/parallel). I know with sockets you can create an array and fire many requests at thesame time (and fastly), but I'm not using sockets.
So my first thought was:
Create multiple threads with a request with ThreadPool.QueueUserWorkItem
-> I got a lot of errors, and half of the time I saw in the html that I wasn't logged in
-> changed my (sequential) requests into asynchronious ones
-> many errors dissapeared (it happens once a while that I'm not logged in).
Conlusion:
My results are 50% more fast (then running 1 request at a time in a normal loop sequentially), here I began to think that only 2 requests at thesame time actually are executed even there are more than 1 threads started
My second thought:
Hey lets try the .net 4.0 parallel.for
-> Not many threads are actually started
-> gives thesame performance as using the threadpooling.
My conclusions are:
-> I think that just 2 requests are executed/started at thesame time, 1 on cpucore1, another on cpucore2
This is nothing like the performance of sockets even on 1 core.
So my question is how can I request and parse multiple pages at thesame time (simultaniously or mixed parallel/sequential with results like an array of sockets give). This cant be real that only 2 parallel or simultaniously connections can be made (as a request).
My structure on this is now:
login
parrallelloop(execute a method in the parallelloop)
method:
VB.NET:
Try
Dim httpStateRequest As HttpState = New HttpState
httpStateRequest.httpRequest = WebRequest.Create(url)
httpStateRequest.httpRequest.CookieContainer = cookies
httpStateRequest.httpRequest.KeepAlive = False
' Get the response object
Dim ar As IAsyncResult
ar = httpStateRequest.httpRequest.BeginGetResponse(AddressOf HttpResponseCallback, httpStateRequest)
Catch wex As WebException
Console.WriteLine("Exception occurred on request: {0}", wex.Message)
End Try
VB.NET:
Private Sub HttpResponseCallback(ByVal ar As IAsyncResult)
running = running & ("running & I'm busy with the request on: " & Date.Now.ToString & " in thread: " & Thread.CurrentThread.ManagedThreadId) & vbCrLf
Try
Dim httpRequestState As HttpState = ar.AsyncState
' Complete the asynchronous request
httpRequestState.httpResponse = httpRequestState.httpRequest.EndGetResponse(ar)
' Read the response into a Stream object.
'Dim httpResponseStream As Stream = httpRequestState.httpResponse.GetResponseStream()
Dim httpResponseStreamReader As New StreamReader(httpRequestState.httpResponse.GetResponseStream())
Dim result2 = httpResponseStreamReader.ReadToEnd.Trim
' Post asynchronous Read operations on stream
httpResponseStreamReader.Close()
Dim doc2 As New HtmlAgilityPack.HtmlDocument()
doc2.LoadHtml(result2)
rootNode = doc2.DocumentNode
'do something with the result
Return
Catch ex As Exception
Console.WriteLine("Exception: {0}", ex.Message)
End Try
End Sub
Parallelloop:
VB.NET:
Parallel.For(begin, iterations, Sub(i)
counter += 1
strThreads = strThreads & "I'm: " & counter & " and in thread: " & Thread.CurrentThread.ManagedThreadId & vbCrLf
executeAmethodWhichDoesAnAsyncRequest(params come here)
End Sub)
A screenshot about the threads (starting/time):
![2eki5qf.jpg](http://i46.tinypic.com/2eki5qf.jpg)
What you see at the first multiline textbox are threads whom are possibly put in a waiting state (since for example I'm 221 and in thread: 6 refer to I'm page 221 to be requested in thread 6)
Btw I have a dualcore 2.8 ghz & broadband connection 15+mbps
Last edited: