vampiro2004
New member
- Joined
- Jun 14, 2023
- Messages
- 1
- Programming Experience
- 3-5
Hi,
just wondering if some one could help with the below code. i am trying to create a web scraper for personal use
It works to the point of pulling the text between the bookmark tags and loading it into the formatted text box and going to the next page and grabbing some of the info from there.
the problem is, is that it moves on too quick and grabs maybe 2 of everything on the page before moving on when there is 5 items before it should load the next page. also is there a way to tell it how many pages to grab before stopping?
i have done programming in the past with VB6 but has been years since then lol
any help will be very much appreciated
just wondering if some one could help with the below code. i am trying to create a web scraper for personal use
It works to the point of pulling the text between the bookmark tags and loading it into the formatted text box and going to the next page and grabbing some of the info from there.
the problem is, is that it moves on too quick and grabs maybe 2 of everything on the page before moving on when there is 5 items before it should load the next page. also is there a way to tell it how many pages to grab before stopping?
i have done programming in the past with VB6 but has been years since then lol
any help will be very much appreciated
VB.NET:
Dim request As HttpWebRequest = WebRequest.Create(txtscrape.Text)
' Get the response from the server
Dim response As HttpWebResponse = request.GetResponse()
' Read the HTML code from the response
Dim reader As New StreamReader(response.GetResponseStream())
Dim html As String = reader.ReadToEnd()
' Load the HTML code into an HtmlDocument
Dim doc As New HtmlAgilityPack.HtmlDocument()
doc.LoadHtml(Html)
'doc.LoadHtml(html2)
Dim nodes As HtmlAgilityPack.HtmlNodeCollection = doc.DocumentNode.SelectNodes("//a[@rel='bookmark']")
Dim text As String = ""
For Each node As HtmlAgilityPack.HtmlNode In nodes
text += node.InnerText + vbCrLf
End
Next
' Find the "next page" link and extract the href attribute
Dim nextLink As HtmlAgilityPack.HtmlNode = doc.DocumentNode.SelectSingleNode("//link[@rel='next']/a")
If nextLink IsNot Nothing Then
' Navigate to the next page
Dim nextUrl As String = nextLink.GetAttributeValue("href", "")
request = WebRequest.Create(nextUrl)
response = request.GetResponse()
reader = New StreamReader(response.GetResponseStream())
Html = reader.ReadToEnd()
' Extract the data from the page
End If
txtFormatted.Text = text