Question Reading From Web Page

Mog

New member
Joined
Dec 2, 2008
Messages
4
Programming Experience
Beginner
I'm using VB.NET 2003 if that makes any difference...

I have been searching around for how to do this for a while now and I just can't seem to get anything to work (well except for one part... which I will detail further down).

What I am trying to do is read data from a table on a web page where a login is required.

Before I can even get to that data I have to login to the website, which is where I run into my first problem... I can setup the request just fine and I've tried setting the credentials using System.Net.NetworkCredentials("username","password") but it never logs me in properly and I only end up getting a read on the login page no successful/failed login at all.

As I've already said I got the read of the page down, just not the page I want... This part probably can be answered by the credentials part above, but do I give the request the page I want first and the login credentials works and brings me to the proper page or do I have to go to the login page then force the request to go to the page I want?

Once I have the page's html into a string variable how do i get the data out of the specific table? The table is always in the same spot, always has the same header information with changing information after that. I can deal with the data once I get it separated just don't know how to separate it easily.

In short any code samples for logging into a web site where networkcredentials doesn't work or an explanation of how to make networkcredentials work on any site would be great. Also, a code sample of how to read each column of each row of a html table. With those 2 code samples I can probably work it out myself.

Feel free to ask questions for more details or point me in the proper direction for where to look for this stuff. Please don't get mad and say "search could find this all for you" as I have no idea what to search for and I'm getting pretty tired/frustrated with not being able to find what I want.
 
It can be done both with a WebBrowser control (visible/hidden) that you automate, or with a at least two WebRequest, where you first go to login page and perform the login and most likely pick up the correct cookies, then go on to the target page. About WebRequest and cookies see this thread: http://www.vbdotnetforums.com/net-sockets/28191-httpwebrequest-cookies.html

The WebBrowser control makes it usually easy to navigate through the document object model tree to get data from tables and such, without it you can use basic string searching and regular expressions, or using Html parsing libraries as mentioned in this post: http://www.vbdotnetforums.com/84994-post4.html

... I just noticed you're on 2003, WebBrowser is not available for you, but the older ActiveX COM control is (AxWebBrowser I think), which is in cases like this often used along with the mentioned MSHTML COM library. Btw, you should really upgrade, even the free VB 2008 Express should be better for you than 2003.
 
Well if it can all be done in one control easily with .NET 2008, then I will just upgrade to that.

I assume at some point the cookies in "CookieJar" get filled with the username and password?

Will need to look up this webbrowser control I'm quite interested now. Thanks.
 
The WebBrowser control (as of .Net 2.0) is a managed wrapper for the old AxWebBrower + MSHTML COM libraries, does the same and have same basic capabilities only wrapped to the much nicer .Net librarys strong type code style, instead of the "everything is Object or an interface and nothing seems connected" COM style where you can't do much without reading lots of cryptic old reference material. Like Internet Explorer this control handles cookies automatically when browsing web pages.

For the WebRequest approach the CookieContainer is usually used for login pages, when you login the page app set some special value there that it uses later to see that the request was previously authenticated, plain user/pass should never be found in a cookie.
 
Does this webbrowser control work with console apps? Figured I would keep the program simple and make it a console app since its gonna be running automatically on a schedule. Guess I'll be digging through msdn and what not.
 
Does this webbrowser control work with console apps?
More awkward, but can be done. I'd rather run it with a minimized forms app that closes itself.
 
Mmk, was figuring I would do a self closing form if it could only be done with forms... Might do it that way anyways.

Gotta read up on webbrowser control and get VS 2008 before I ask more questions :p

EDIT: Ok, I think I've got all I need... its logging in, moving to the page I want, then pulling the data from the table... just gotta fire that data into a datatable/dataset and I will be golden from there.

Thanks for the help on this JohnH, it was much appreciated.
 
Last edited:
Back
Top