UserId & Password With HtmlAgilityPack

Rivelyn

New member
Joined
Jan 8, 2015
Messages
2
Programming Experience
3-5
Hey All,

I have a simple scenario as follows:


  1. Using the RoadKill Wiki engine on a shared hosting package I've created some articles. The wiki is not viewable by the public and requires an email address and password to be entered to view any of the articles.
  2. I am working on another small piece of software that has a function in it to grab excerpts from the wiki articles, some 'how-to' guides, and display them in the software.

I chose to use the HtmlAgilityPack for this function because it was simple and was exactly what I needed, and it brought the wiki articles into the software formatted in HTML instead of creole when grabbed directly from the SQL server.

Where I have hit a very high and very strong wall is with passing the credentials to the wiki for a login and password. I have tried code snippets from a few dozen sources and have exhausted everything I can think of with absolutely no luck at all. The closest I get if there is not a straight up error is a prompt for a login and password from the wiki engine and the login and password fields are actually populated..... Just for fyi, I did shut off the login for the wiki to make it publicly viewable just to make sure it wasn't a simple error and it wasn't. I did get exactly what I was looking for.

I've tried passing a userId and password through different combinations of the HtmlWeb.Load() overload function with no results. I also have tried just a simple HttpWebRequest with a stream reader to just get the body of the wiki article then pass it to the HtmlAgilityPack for parsing and have tried a few different combinations of passing NetworkCredentials that way as well, with no results.

Here are my 2 bits of code, first is the HtmlAgilityPack with no overloads because all I get is errors that the HtmlWeb.Load string is improperly formatted with no detail about the syntax issue.

VB.NET:
    Private Shared Function getWikiArticles(ByVal id As String) As String
        Dim wikiURL As String = "http://wiki.mysite.com/wiki/" & id
        Dim str As New StringBuilder

        Dim web As New HtmlWeb
        web.UserAgent = "MySite Query"

        Dim htmldoc As New HtmlDocument

        htmldoc = web.Load(wikiURL)

        For Each node As HtmlNode In htmldoc.DocumentNode.SelectNodes("//div[@id='container']")
            str.Append(node.InnerHtml)
        Next

        Return str.ToString()
    End Function

Next with the HttpWebRequest that fills the login and password field at least, but doesn't actually grab the article, just the login page markup.

VB.NET:
    Public Shared Function getArticles() As String
        Dim urlString As String = "http://wiki.MySite.com/wiki/2"
        Dim Username As String = "login@home.net"
        Dim Password As String = "password"

        Dim gSearch As HttpWebRequest = WebRequest.Create(urlString)
        gSearch.Timeout = 35000
        gSearch.Credentials = New NetworkCredential(Username, Password)

        Dim gResp As HttpWebResponse = gSearch.GetResponse()
        Dim gsr As New StreamReader(gResp.GetResponseStream())
        Dim gResults As String = gsr.ReadToEnd()

        gsr.Close()

        Return gResults
    End Function

Please, any help would be great. I am at a total loss at this moment....
 

Rivelyn

New member
Joined
Jan 8, 2015
Messages
2
Programming Experience
3-5
I've tried this version of the HtmlAgilityPack HtmlWeb.Load overload and I am getting an Object reference not set to an instance of an object error when looping through the nodes of the page. So obviously there is not data coming back from the page.

VB.NET:
    Private Shared Function getWikiArticles(ByVal id As String) As String
        Dim wikiURL As String = "http://wiki.mywiki.com/wiki/" & id
        Dim str As New StringBuilder

        Dim web As New HtmlWeb
        web.UserAgent = "My Site Query"

        Dim htmldoc As New HtmlDocument

        Dim address As Uri = New Uri("http://wiki.mywiki.com")
        Dim myProxy As New WebProxy(address)

        htmldoc = web.Load(wikiURL, "POST", myProxy, New NetworkCredential("login", "password"))

        For Each node As HtmlNode In htmldoc.DocumentNode.SelectNodes("//div[@id='container']")
            str.Append(node.InnerHtml)
        Next

        Return str.ToString()
    End Function
 
Top Bottom