Fastest way to download complete HTML string?

22-degrees

Well-known member
Joined
Feb 11, 2012
Messages
156
Location
South East Ireland
Programming Experience
1-3
Hi, I have been using WebClient and DownloadString() to get the HTML from a site that I need data from. Halfway through the parsing process of the downloaded HTML string I parse several more URL's and download the HTML and parse some particular data from those as well.

This midpoint download is what slows the entire process down, taking c550 milliseconds just to download the HTML string. In the past I didn't mind because the program is a labour intensive task anyway that takes up to an hour to complete the entire job so adding a few more minutes was not a concern for me.

Now however, I want to expand on a particular section which will see a 40-60-fold increase in the number of URL's I will need to download and parse midway through the initial parsing. This will not be a feasible option if I continue to use the current method of downloading HTML

The HTML itself is very sloppy so I need to download it in its entirety to parse the data I need. For simple integration to the current code structure, I also need to download the HTML at this midway point when I meet the URL's in question and parse the new HTML's data on the spot.

Is there a quicker way to download the full HTML from a web page or am I looking at a limitation of my ISP/Network here? I'm not experienced at all with web tasks through vb.net so I would appreciate it if anyone could point me in the right direction or provide a keyword or 2 that might help me in my quest to improve this process.
 
Presumably you are calling DownloadString again on the same thread, which means that that download must complete before the previous parsing task can continue. The thing to do would be to use multi-threading in some form so that multiple downloads and parsing operations can occur simultaneously. Asynchronous functionality is built into the WebClient class already, so you should just use that. You can call DownloadStringAsync instead and then handle the DownloadStringCompleted event. You may need to check but I'm fairly certain that the event is raised on the secondary thread, so you can parse on that thread without blocking any other operation.
 
Back
Top