Question Scraping web page: Protocol Violation

VentureFree

Well-known member
Joined
Jan 9, 2008
Messages
54
Programming Experience
5-10
I've got some equipment that I can connect to via my web browser to get some diagnostic info. A friend wrote a Bash script on his Linux machine to scrape some of that data, and I'm trying to mimic that functionality in .Net. The problem is I'm getting a Protocol Violation that neither my browser, nor his wget command is encountering.

I've tried getting the page via something like the following:
VB.NET:
Public Function GetPageText(ByVal Url As String) As String
    ' Set up the request to the server
    Dim myRequest As HttpWebRequest = DirectCast(HttpWebRequest.Create(Url), HttpWebRequest)
    myRequest.Method = "GET"

    ' Read the response from the server
    Dim myResponse As HttpWebResponse = DirectCast(myRequest.GetResponse(), HttpWebResponse)
    Dim read As New StreamReader(myResponse.GetResponseStream())
    Dim sXML As String = read.ReadToEnd()
    myResponse.Close()
    Return xSML
End Function

I also tried using HtmlAgilityPack with something like this:
VB.NET:
Public Function GetPageHtml(ByVal Url As String) As HtmlDocument
    Dim WebSite As New HtmlWeb()
    Dim WebPage As HtmlDocument = WebSite.Load(Url)
    Return WebPage
End Function
The exact error that I'm getting says this: "The server committed a protocol violation. Section=ResponseBody Detail=Response chunk format is invalid"

I did a quick look at the response headers for the page, and this is what it is saying:
VB.NET:
Server: Rapid Logic/1.1
Date: Mon Mar 23 11:08:01 1970 GMT
Content-Type: text/html
Transfer-Encoding: chunked

200 OK

I've tried several fixes that I found online, including setting "useUnsafeHeaderParsing" to true in my app.config and also at run time via reflection. I also tried both:
VB.NET:
myRequest.ProtocolVersion = HttpVersion.Version10
' and
myRequest.ProtocolVersion = New System.Version(1, 0)

None of these have worked for me. I just need the raw text from the web page and I can work on it from there. How do I get around this problem? Is it even possible in .Net, or will I have to shell out to some other tool to save a local copy of the page that I can then manipulate myself? I really would rather not have to do that.
 
Top Bottom