Question XPath or any other way to solve this ...

Spelltox

Member
Joined
Feb 27, 2009
Messages
8
Programming Experience
3-5
Hello to all !

Sorry in advance for the long post.

I'm trying to get the value of a specific <div> from an html.
This div has a class attribute of "itemPrice6", but problem is that this class appears few more times for other <div>s in which i'm not interested in.

Best thing for me is if i am able to access the needed <div> using something like XPATH so i can target only those i need.

I've been trying anything i could think of this past week but nothing seems to work. I'm an amateur programmer, so i might be making an obvious mistake...

My CODE :

VB.NET:
    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click

        Dim web As New HtmlAgilityPack.HtmlWeb
        Dim doc As HtmlAgilityPack.HtmlDocument = web.Load("http://www.someurl.com/models.aspx?code=3010")
        Dim prices As HtmlAgilityPack.HtmlNodeCollection = doc.DocumentNode.SelectNodes("/div[@class='myItems']/div[@class='Inside']/div[@class='itemStart']/div[@class='itemPrice6']/span")
        For Each price As HtmlAgilityPack.HtmlNode In prices
            Console.WriteLine(price.InnerText)
        Next

    End Sub

(Part of) My HTML :

HTML:
<div class="myItems">
    
	<div class="Inside">
                
		<div class="itemStart">

			<div class="img">
				<a id="" class="cgrey" href="clientcard.aspx?siteid=2057">
					<img id="" src="http://www.villamarketers.com/community/wp-content/uploads/2011/05/free.gif" alt="item one of ten" style="border-width:0px;height:31px;width:88px;" />
				</a>
			</div>
			
			<div class="itemLink2">
				<div class="itemRate3">
					<img id="" src="http://www.villamarketers.com/community/wp-content/uploads/2011/05/free.gif" alt="Rating" style="border-width:0px;" />
					<a id="" class="pTrust" rel="nofollow" href="pTrust.aspx" target="_blank"><span></span></a>
				</div>
				<div class="clearLink" style="vertical-align: bottom;">
					<a id="" href="clientlog.aspx?sitemode=511">Find Ideas</a>
				</div>
			</div>
			
			<div class="trueDetails7">
				<a id="" class="fullDetails" rel="nofollow" onclick="" href="td.aspx?iid=917711127&stat=none" target="_blank">Printer Model x40</a><br />
				<a id="" class="iconLinkTip" href="javascript:void(0);" onclick="javascript:crivetip(showPrinter);" onmouseout="javascript:hidecrivetip()">More Info</a><br />
				<div>
					price:<span>393 $</span><br />
					Shipping:<span>Free</span><br />
					Delivery:<span>7 days</span> 
				</div>
			</div>
			
			<div class="itemPrice5" style="display:none;visibility:hidden;">
				   Final Price:<br /><span>359 $</span>
			</div>
			
			<div class="itemPrice6">
				Final Price:<br /><span>393 $</span>
			</div>
			
			<div class="itemDetails5">
				<a id="" class="button2" rel="nofollow" onclick="" href="mrl.aspx?iid=917711127&mode=cope" target="_blank">buy</a>
				<div class="myShop">
					<a href='/dnr.aspx?iid=917711127&mode=free' target='_blank' onclick="">Arnolds Shop</a>
				</div>
			</div>
			
		</div>
				
		<div class="horizontalRow"></div>
				
		again : div class="itemStart" ...
		again : div class="itemStart" ...
		again : div class="itemStart" ...

	</div>
	
</div>

Sorry for such a long post, i'm out of ideas, any help would be greatly appreciated !
 
The xpath for that sample document is correct and outputs "393 $" when I run it. I loaded the document from the string you posted, but you can see for yourself if your document is loaded from the web url or not. Also, I guess that sample document is just an excerpt (contained in a full html document), if so you have to start the xpath from root or relative for example "//div[.....".
 
Second run

Thank you John,
I want to try that (maybe something is wrong in the full html document).

1. You wrote "I loaded the document from the string you posted", for learning purposes - what do i have to change in my code to load html from string ?

2. The full html page i'm dealing with is in "windows-1253" encoding, when reading it i'm getting some of the characters as "????? ????".
I've set
VB.NET:
web.AutoDetectEncoding = True
but that doesn't seem to change anything,
What do i need to change in my code in order to force the encoding i need ?

Thanks in advance !
 
I created a HtmlAgilityPack.HtmlDocument and used its LoadHtml method.

I too was having some problems with encoding detection with that library, and LoadHtml was the solution I ended up with, thus downloading the url string content beforehand, that resolved my needs and the document loaded with the declared encoding in the various html documents I was handling. HtmlDocument has three properties and three method that has the word 'encoding' in them (VB 2010 intellisense is really nice with partial matches!), but at the time I could not get any of them to do what I wanted, you could have a look anyway.
 
Back
Top