Question help with web scraper

vampiro2004 · Jun 14, 2023

Hi,

just wondering if some one could help with the below code. i am trying to create a web scraper for personal use
It works to the point of pulling the text between the bookmark tags and loading it into the formatted text box and going to the next page and grabbing some of the info from there.
the problem is, is that it moves on too quick and grabs maybe 2 of everything on the page before moving on when there is 5 items before it should load the next page. also is there a way to tell it how many pages to grab before stopping?

i have done programming in the past with VB6 but has been years since then lol

any help will be very much appreciated

VB.NET:

Dim request As HttpWebRequest = WebRequest.Create(txtscrape.Text)

        ' Get the response from the server
        Dim response As HttpWebResponse = request.GetResponse()

        ' Read the HTML code from the response
        Dim reader As New StreamReader(response.GetResponseStream())
        Dim html As String = reader.ReadToEnd()

        ' Load the HTML code into an HtmlDocument
        Dim doc As New HtmlAgilityPack.HtmlDocument()
        doc.LoadHtml(Html)
        'doc.LoadHtml(html2)

        Dim nodes As HtmlAgilityPack.HtmlNodeCollection = doc.DocumentNode.SelectNodes("//a[@rel='bookmark']")
        Dim text As String = ""
        For Each node As HtmlAgilityPack.HtmlNode In nodes
            text += node.InnerText + vbCrLf
            End

        Next
        ' Find the "next page" link and extract the href attribute
        Dim nextLink As HtmlAgilityPack.HtmlNode = doc.DocumentNode.SelectSingleNode("//link[@rel='next']/a")
        If nextLink IsNot Nothing Then
            ' Navigate to the next page
            Dim nextUrl As String = nextLink.GetAttributeValue("href", "")
            request = WebRequest.Create(nextUrl)
            response = request.GetResponse()
            reader = New StreamReader(response.GetResponseStream())
            Html = reader.ReadToEnd()

            ' Extract the data from the page
        End If
        txtFormatted.Text = text

jdelano · Jun 16, 2023

I would think that End statement inside the For Each node is causing some problems for you. Are you meaning to tell the code to exit the for loop? In which case that statement would be Exit For.

Question help with web scraper

vampiro2004

New member

jdelano

Well-known member

Similar threads

Share this page

Latest posts