Web page file size

Ford-p · Jul 2, 2007

Hey,

I'm trying to get the size (KB) of web pages so I can add a percentage bar to my application.

The following ONLY works for binary as far as I can see:

VB.NET:

    Public Function Size(ByVal strURL As String) As String
        Dim objRequest As WebRequest = WebRequest.Create(strURL)
        Dim objResponse As WebResponse = objRequest.GetResponse

        Return objResponse.ContentLength.ToString
    End Function

Any help on making this work for text would be much appreciated,
Joe

JohnH · Jul 2, 2007

It does work for text also, for any webrequest be it a page, a file or image it will return the number of bytes of the response content.

Ford-p · Jul 2, 2007

Thanks for your reply.

The following code get the size of a image just fine:

VB.NET:

        Dim objRequest As Net.WebRequest = Net.WebRequest.Create("[B]http://www.google.co.uk/intl/en_uk/images/logo.gif[/B]")
        Dim objResponse As Net.WebResponse = objRequest.GetResponse

        MsgBox(objResponse.ContentLength.ToString)

But if I try to get the size of a HTML page with the following code I always get -1

VB.NET:

        Dim objRequest As Net.WebRequest = Net.WebRequest.Create("[B]http://www.google.co.uk/intl/en/about.html[/B]")
        Dim objResponse As Net.WebResponse = objRequest.GetResponse

        MsgBox(objResponse.ContentLength.ToString)

Any ideas?

JohnH · Jul 2, 2007

Google response return transfer-encoding "chunked" and choose not to set the content-length header. I've not checked content-length response for many pages (tested some) but it's the first I've seen doing this. It means they return chunks of the response stream as you go, and the only way of knowing the actual length in this case is to read the stream to end.

Here is an alternative that gave me content lengths, set the accept-encoding header to gzip and/or deflate and a compatible user-agent, if the response is compressed it also has the content length header I see (2459 bytes gzip):

VB.NET:

objRequest.UserAgent = "Mozilla/4.0 (compatible; MSIE 7.0)"
objRequest.Headers.Add(Net.HttpRequestHeader.AcceptEncoding, "gzip,deflate")

If you need to read the content then you must also decompress it, System.IO.Compression namespace have classes for this.

Ford-p · Jul 3, 2007

Thanks for the information JohnH. I am trying to make a simple application that displays the files size and tries to match a regex string.

I use the following code to download the HTML page:

VB.NET:

        Dim objWeb As New WebClient()
        Dim objStream As Stream = objWeb.OpenRead(uriURL)
        Dim objReader As New StreamReader(objStream, System.Text.Encoding.UTF7, False)

        Dim strReturn As String = objReader.ReadToEnd()

It seems to work for gzipped code and normal.

The code you gave me displays the size although it seems to to wrong. It shows Google.com as 1704 bytes when it is actually 4275 bytes. Have I missed something?

JohnH · Jul 3, 2007

It's the compressed size when gzipped content is sent. Your code doesn't decompress, so if it's working thats plain text you're getting.

Ford-p · Jul 3, 2007

Thanks,

Is there a way of getting or working out the decompressed size without downloading the whole file? It seems browser download dialogues show the full file size. Also I had a look at IO.Compression but could not see any way decompressing the downloaded HTML (I don't want it saved to a file).

JohnH · Jul 3, 2007

Ford-p said:
Is there a way of getting or working out the decompressed size without downloading the whole file?

Not that I know of. You could do a lot of compression for common size and content webpages and find an estimate common compression ratio, then take a guess.

Ford-p said:
It seems browser download dialogues show the full file size.

For what? If you mean "page" you should know a webpage normally reference many other files like external scripts and stylesheets and images etc that the browser have to download before it can render and display the page, the page load progress show the progress for all these elements, one small plain text html markup is hardly noticable in that show. It is more likely that progress show number of items downloaded rather than exact byte counts for the "webpage" compounds.

Ford-p said:
Also I had a look at IO.Compression but could not see any way decompressing the downloaded HTML (I don't want it saved to a file).

IO.Compression is about streams, same as responsestream. WebClient is a class that wraps much standard method and useful functionality to the basic HttpWebRequest, so it is the same thing. Here is the same code that use WebClient, set headers to request a gzip compressed Google start page, then decompress it:

VB.NET:

Dim Web As New Net.WebClient
Web.Headers(Net.HttpRequestHeader.AcceptEncoding) = "gzip"
Web.Headers(Net.HttpRequestHeader.UserAgent) = "Mozilla/4.0 (compatible; MSIE 7.0)"
Dim s As IO.Stream = Web.OpenRead("http://www.google.com")
Dim compressedContentLength As Long = CLng(Web.ResponseHeaders(Net.HttpResponseHeader.ContentLength))
Dim comp As New IO.Compression.GZipStream(s, IO.Compression.CompressionMode.Decompress)
Dim sr As New IO.StreamReader(comp, True)
Dim responseText As String = sr.ReadToEnd()
sr.Close()

About streams, you see compression stream is wrapped around the response stream, then a streamreader reads from the compression stream. The base stream content is compressed, GZipStream decompress it, StreamReader reads the plain text.

Ford-p · Jul 3, 2007

Thanks it makes total sence now, great work as always JohnH

Web page file size

Ford-p

Active member

JohnH

VB.NET Forum Moderator

Ford-p

Active member

JohnH

VB.NET Forum Moderator

Ford-p

Active member

JohnH

VB.NET Forum Moderator

Ford-p

Active member

JohnH

VB.NET Forum Moderator

Ford-p

Active member

Similar threads

Share this page

Latest posts