problem using GetString to fetch HTML

cmd_17

Member
Joined
Dec 7, 2008
Messages
13
Programming Experience
Beginner
Hi,

I have a function in my WinForms app that needs to retrieve the user's default user agent. To do this, I've created a web page (ASP.NET) that displays the user agent in a Label in the Page_Load. The problem that I'm having is that when I fetch the HTML from that page, the result includes all of the HTML except what should be displayed in the Label (i.e., the user agent).

Here's my code:

VB.NET:
Private Function GetUserAgent(ByVal theUrl As String) As String

        Dim proc As New Process
        proc.StartInfo.CreateNoWindow = True
        proc.StartInfo.WindowStyle = ProcessWindowStyle.Hidden
        proc.StartInfo.FileName = theUrl
        proc.Start()
        Dim client As New WebClient
        Dim dataBuffer As Byte() = client.DownloadData(theUrl)
        Dim result As String = Encoding.UTF8.GetString(dataBuffer)

        'The following lines are commented out b/c when I first tested the
        'application, no result was returned. Then I commented out these
        'lines and discovered that while the HTML was being returned,
        'the inner text of the <span> tag was blank. 
        'Dim searchStringBegin As String = "<span id=""Label1"">"
        'Dim searchStringEnd As String = "</span>"
        'Dim startIndex As Integer = result.IndexOf(searchStringBegin, 0)
        'Dim endIndex As Integer = result.IndexOf(searchStringEnd, startIndex)
        'Dim len As Integer = endIndex - startIndex - searchStringBegin.Length
        'Return result.Substring(startIndex + searchStringBegin.Length, len)

        Return result

    End Function

When that code is run, it returns the following result:

VB.NET:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head><title></title>
</head>
<body>
    <form name="form1" method="post" action="ua.aspx" id="form1">
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwUIOTEyNDUzNDYPZBYCAgMPZBYCAgEPFgIeBFRleHRlZGRFwgcdsZHQ3wZjwcDujr/taZqCZg==" />

    <p>Your user agent is: </p>
    </form>
</body>
</html>

However, when viewing the page's source in a browser, it contains the following code:

VB.NET:
<p>Your user agent is: <span id="Label1">(user agent string here)</span></p>

Any ideas why GetString is not picking up the Label contents? :confused:

Note: For reasons beyond my control, this app is coded using .NET 1.1. Otherwise, I'd be doing things a bit differently. ;)
 
Hep, You could just clear it with a simple spect..

Code:
VB.NET:
Dim wbtus As New WebBrowser
wbtus.Navigate("http://www.yoursite.com/")
TextBox1.Text = wbtus.DocumentText

And as for the Document title, etc.. Google UK!

VB.NET:
TextBox2.Text = wbtus.DocumentTitle

If you wanted to get a simple structure, etc.. your agent.!

VB.NET:
Dim useragent As String = TextBox3.Text
TextBox3.Text = wbtus.Document.DomDocument.GetElementById("Label1").Value
 
Thanks for your reply, tonycrew. However, I am using version 1.1 of the .NET Framework (because I have to, not because I want to), and the WebBrowser control is not available in .NET 1.1. While I am aware that I could create WebBrowser-like functionality by using the WebBrowser ActiveX control available in VB6 (AxWebBrowser), this is not a solution because the whole point is to get the user's default user agent. That is, whatever they have set as their default browser, which may or may not be IE. Using a WebBrowser or an AxWebBrowser cannot accomplish this, which is why I have used Process.Start().

Believe me, I'd love to be using .NET 2.0 to get this done, but it's just not an option.

The real issue here is why the text of the Label control on my .aspx page returns nothing when accessed via the Windows Form app, but the text is visible in the source code (and on the page) when my .aspx page is accessed manually through a browser.
 
Just thought of something...

The reason that the user agent is blank is because the WebClient makes it so. In other words, when accessing a page via WebClient, the user agent will be blank, as it would be if I'd used a WebBrowser. Process.Start() opens a page with the user agent displayed, but it looks like there's no way to access the code of that page once it gets opened up in the default browser, whereas if a WebBrowser (or AxWebBrowser) control was used, I could get at the code using the Document property.

Is there no way to get at the HTML code when Process.Start() is used to open the target web page?
 
Well ok, I get where your coming from, but cant you just create the WB using a simple command..

VB.NET:
Dim wbub As New WebBrowser

im sure you must be able to function with a Webbrowser rather than do it manually through process, unless of course you do it secretly processed..

VB.NET:
Dim sProc As Secret.Process

I dont use that version of yours but still i recommend that there is this solution to use.
 
The WebClient makes a request just like any browser would, it is a hightly light-weight browser if you will. What you ask is not possible by your method. Obviously (to only me?) you'd have to actually use the default browser to make that page response the user agent of the request browser, and then the impossible task of automating any browser in the world to access the document loaded. As with many things of system settings you will find the answer in registry, one of the first internet search matches says this setting is:
HKCU\Software\Clients\StartMenuInternet
With VB you can access the registry through Registry class (of Microsoft.Win32 namespace).
 
Well ok, I get where your coming from, but cant you just create the WB using a simple command..

VB.NET:
Dim wbub As New WebBrowser

Yes, in VB 2005/2008, it would be that simple, but there's no WebBrowser in .NET 1.1

I'll keep working on this and post back if I find a solution that works with the .NET Framework 1.1. Thanks again for your assistance.
 
Obviously (to only me?) you'd have to actually use the default browser to make that page response the user agent of the request browser, and then the impossible task of automating any browser in the world to access the document loaded. As with many things of system settings you will find the answer in registry, one of the first internet search matches says this setting is:

With VB you can access the registry through Registry class (of Microsoft.Win32 namespace).

It's now obvious to me as well. :)

I looked into reading the registry, and it appears that I would only be able to get the name of the .exe associated with the default browser. What I'm really after is a complete user agent string such as:

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.10) Gecko/2009042316 Firefox/3.0.10 (.NET CLR 3.5.30729)

Is it possible with VB to determine what the user agent string would be?
 
Each browser has it's own user agent system, you can for example read more abouts IEs here: Understanding User-Agent Strings

Exactly. There are tons of user agent strings, and they can reveal much more than which browser a person is using (including all versions of the .NET Framework that exist on a person's machine). Maybe I'm not being clear enough about what I want to do. What I need to get somehow is the user agent that would appear in the HTTP Headers IF the user were to have navigated to a website using their default browser. What I was trying to do with Process.Start() above is open a web page in the user's default browser and somehow determine the resulting user agent header. I now understand why this isn't possible, but nevertheless, it demonstrates the final result that I'm looking for.

To be a little more specific, let's say I have an ASP.NET page that displays the user agent upon Page_Load like so:

VB.NET:
Protected Sub Page_Load(ByVal sender as Object, ByVal e As EventArgs) Handles Me.Load
     If Not Request.UserAgent Is Nothing Then
          Label1.Text = Request.UserAgent.ToString()
     Else
          Label1.Text = String.Empty
     End If
End Sub

Now when this web page is opened up from within a Windows Forms application via Process.Start(), it should show the default user agent string. By "default" I mean whatever the user has designated as their default browser, which may or may not be some flavor of IE.

For this project, I cannot use a WebClient or WebBrowser (or AxWebBrowser) to navigate to a certain web page because then all user agents would be blank, which is unacceptable. If I do use a WebClient or other comparable means, I can do that only after I have determined the true, default user agent, which I can then use as a parameter in the Navigate2 method if I'm not mistaken. The challenge here is getting the correct user agent.


do a secret process

@tonycrew: I don't quite understand what you wrote, but you are right, I need it to be a "secret process", so if you wouldn't mind elaborating on this, I'd appreciate it. :)
 
There are tons of user agent strings,
There are only a few browsers commonly used on Windows, mostly IE and Firefox now. You already know which browser is default, you know everything there is to know about IE user agent strings (previous link), figure out Firefox and you have pretty much covered the bases, expand as needed.
 
By "tons", I meant thousands. While there aren't thousands of browsers, there are many versions of each browser, and there are multiple versions of the .NET Framework, which would show in the user agent header if installed on the machine. Thus, there are thousands of combinations of these two elements alone. To see what I'm talking about, check out:
User Agent Database – 90689 User Agents

User agent strings are almost like snowflakes. :)

Anyway, I have found a solution to my user agent problem. I have the application opening a web page in the default browser, and this web page captures the user agent, along with a unique alphanumeric sequence in the query string, and stores them both in a database. Then the app requests a second web page that calls the database and returns the user agent associated with the alphanumeric key (provided in the query string of this second web page).
 
User agent strings are almost like snowflakes.
Hmm, flaky. Here's another one:
VB.NET:
Dim web As New Net.WebClient
web.Headers.Add(Net.HttpRequestHeader.UserAgent, "You're missing the point.")
Your solution is convenient and simple though...
 
Hmm, flaky. Here's another one:
VB.NET:
Dim web As New Net.WebClient
web.Headers.Add(Net.HttpRequestHeader.UserAgent, "You're missing the point.")

With all due respect, I believe that you are missing the point, which is to enable the Windows application to mimic a real website visitor. In other words, when the application accesses a web page, it must appear as though the page was requested through a stand-alone browser, not a WebClient, WebBrowser, AxWebBrowser, etc., all of which would, by default, show a blank user agent. Further, it must appear as though the site is requested through a wide variety of user agents.

Yes, one may specify a particular user agent, as you noted above, and I intend to specify the user agent that would appear in the headers if the person were to have requested the same web page through his or her default browser. Determining the exact user agent via the solution that I presented avoids me having to manually compile and maintain a list of user agent strings and have the application select one based on current browser usage statistics, which vary from site to site (and month to month, year to year) and would be known only to the site owners, who have access to the logs.

Getting back to the point, what I do not want is for the target website to see a disproportionately large number of user agent strings that are either blank or one browser in particular whose frequency of use may not be in alignment with the site's typical usage data.
 
Back
Top