How do I extract data from a webpage, (example page included)

shurb

Member
Joined
Oct 9, 2007
Messages
16
Programming Experience
1-3
I need to extract just the info on the page but what I am doing is not working as I had hoped.

If you look at this url:
http://arrestinquiryweb.co.mecklenb...est=&BrowseArrestDate=&BrowseLastDayArrests=X

I can get all the basic info. Where I run into problems is at the point of the line that reads "Charges for Arrest #" and the items that follow it.

1. I need to be able to pull the Arrest number from that line "Charges for Arrest #"
2. I also need the court case, arrest type, charge description, Bond Amount, Bond Type, and Arrest Process.

The catch is that if there is more than one court case, arrest type, etc. I need to be able to add a 2nd line to my results file.

So for example for this url:
http://arrestinquiryweb.co.mecklenb...est=&BrowseArrestDate=&BrowseLastDayArrests=X

I am trying to end up with the following line(s) in a text file, (1 line each of course but they are word wrapped here):
BELK, HILLARY ANGELA,1372347,,303666,01/14/1982,W/F,07/10/2008,CMPD,1630 DILWORTH RD E CHARLOTTE NC 28203,0823235101,TRAFFIC DRIVING WHILE IMPAIRED, 2500, SECURED
BELK, HILLARY ANGELA,1372347,,303666,01/14/1982,W/F,07/10/2008,CMPD,1630 DILWORTH RD E CHARLOTTE NC 28203,0823235201,TRAFFIC RECKLESS DRIVING - WILLFUL/WANTON DISREGARD,500,SECURED
BELK, HILLARY ANGELA,1372347,,303666,01/14/1982,W/F,07/10/2008,CMPD,1630 DILWORTH RD E CHARLOTTE NC 28203,0823235301,TRAFFIC FAILURE TO COMPLY WITH LICENSE RESTRICTIONS,500,SECURED


I can get everything to work except the portion at the end which contains the court case, type, charge description, etc. The end goal is a file I can easily import into a DB. Unfortunately I do not have access to the sites db or else this would be so simple :)
 
The table index 8 contains the charges, first row is headers then one row for each charge, the innertext of TD elements of each row contains the values. Example reading this from WebBrowser control:
VB.NET:
Dim tables As HtmlElementCollection = Me.WebBrowser1.Document.GetElementsByTagName("table")
Dim charges As HtmlElement = tables(8)
Dim rows As HtmlElementCollection = charges.GetElementsByTagName("tr")
For r As Integer = 1 To rows.Count - 1 ' skips header row
    Dim cells As HtmlElementCollection = rows(r).GetElementsByTagName("td")
    For Each td As HtmlElement In cells
        'get td.InnerText here
    Next
Next
Person data is best read from table index 5 where you similarly analyze the TD cells, one of them is the Arrest#.
 
Back
Top