VentureFree
Well-known member
- Joined
- Jan 9, 2008
- Messages
- 54
- Programming Experience
- 5-10
I've got some equipment that I can connect to via my web browser to get some diagnostic info. A friend wrote a Bash script on his Linux machine to scrape some of that data, and I'm trying to mimic that functionality in .Net. The problem is I'm getting a Protocol Violation that neither my browser, nor his wget command is encountering.
I've tried getting the page via something like the following:
I also tried using HtmlAgilityPack with something like this:
The exact error that I'm getting says this: "The server committed a protocol violation. Section=ResponseBody Detail=Response chunk format is invalid"
I did a quick look at the response headers for the page, and this is what it is saying:
I've tried several fixes that I found online, including setting "useUnsafeHeaderParsing" to true in my app.config and also at run time via reflection. I also tried both:
None of these have worked for me. I just need the raw text from the web page and I can work on it from there. How do I get around this problem? Is it even possible in .Net, or will I have to shell out to some other tool to save a local copy of the page that I can then manipulate myself? I really would rather not have to do that.
I've tried getting the page via something like the following:
VB.NET:
Public Function GetPageText(ByVal Url As String) As String
' Set up the request to the server
Dim myRequest As HttpWebRequest = DirectCast(HttpWebRequest.Create(Url), HttpWebRequest)
myRequest.Method = "GET"
' Read the response from the server
Dim myResponse As HttpWebResponse = DirectCast(myRequest.GetResponse(), HttpWebResponse)
Dim read As New StreamReader(myResponse.GetResponseStream())
Dim sXML As String = read.ReadToEnd()
myResponse.Close()
Return xSML
End Function
I also tried using HtmlAgilityPack with something like this:
VB.NET:
Public Function GetPageHtml(ByVal Url As String) As HtmlDocument
Dim WebSite As New HtmlWeb()
Dim WebPage As HtmlDocument = WebSite.Load(Url)
Return WebPage
End Function
I did a quick look at the response headers for the page, and this is what it is saying:
VB.NET:
Server: Rapid Logic/1.1
Date: Mon Mar 23 11:08:01 1970 GMT
Content-Type: text/html
Transfer-Encoding: chunked
200 OK
I've tried several fixes that I found online, including setting "useUnsafeHeaderParsing" to true in my app.config and also at run time via reflection. I also tried both:
VB.NET:
myRequest.ProtocolVersion = HttpVersion.Version10
' and
myRequest.ProtocolVersion = New System.Version(1, 0)
None of these have worked for me. I just need the raw text from the web page and I can work on it from there. How do I get around this problem? Is it even possible in .Net, or will I have to shell out to some other tool to save a local copy of the page that I can then manipulate myself? I really would rather not have to do that.