FunkiMunky
New member
- Joined
- Sep 1, 2008
- Messages
- 3
- Programming Experience
- Beginner
I am trying to read data from an html page. The section that has data is
as you can see there are a number of div tags and in particular text that reads <!-- begin content --> and <!-- end content -->
I want the hrefs and the the href text. I have thought that maybe some straight string maniplation might do the job splitting the text into parts. I have also been thinking that their might be a way to just get the ul html control directly.
Any help in the right direction would be appreciated.
<!-- begin content --> <div class="box">
<h2 class="title">Search results - [ <i>2997 businesses found </i>]</h2>
<div class="content"><ul class="search-data">
<li><a href="?q=node/593">124 Facilities</a></li>
<li><a href="?q=node/597">2-0-2 Media</a></li>
<li><a href="?q=node/199">2.35 Research PLC</a></li>
<li><a href="?q=node/598">24-6 Cine & TV Services</a></li>
<li><a href="?q=node/599">27 Records</a></li>
<li><a href="?q=node/3029">2b Media Services</a></li>
<li><a href="?q=node/600">3 Bear Animations</a></li>
<li><a href="?q=node/6420">3-D Revolution Productions</a></li>
<li><a href="?q=node/580">3-D Revolution Productions</a></li>
<li><a href="?q=node/287">365Digital</a></li>
<li><a href="?q=node/601">3D Creations</a></li>
<li><a href="?q=node/603">3D Imaging</a></li>
<li><a href="?q=node/605">3D Jamie</a></li>
<li><a href="?q=node/7571">3D Orangepanda Digital Media</a></li>
<li><a href="?q=node/607">3DD Entertainment Ltd</a></li>
<li><a href="?q=node/289">3Dlabs</a></li>
<li><a href="?q=node/5846">3DRequest™</a></li>
<li><a href="?q=node/591">3p Underground Media UK Ltd</a></li>
<li><a href="?q=node/608">3rd Eye Broadcast Group</a></li>
<li><a href="?q=node/609">3rd Wave Graphics</a></li>
<li><a href="?q=node/610">3Sixty Media</a></li>
<li><a href="?q=node/613">422 South (Bristol)</a></li>
<li><a href="?q=node/612">422 South (Manchester)</a></li>
<li><a href="?q=node/310">7 Star Web Services</a></li>
<li><a href="?q=node/614">750mph</a></li>
<li><a href="?q=node/7197">A Bright Gem</a></li>
<li><a href="?q=node/582">A Double M Productions Ltd</a></li>
<li><a href="?q=node/615">A M Visualisation Ltd</a></li>
<li><a href="?q=node/616">A Productions</a></li>
<li><a href="?q=node/618">A Works TV Ltd</a></li>
<li><a href="?q=node/619">A. J. Murray</a></li>
<li><a href="?q=node/620">A.D. Modelmaking</a></li>
<li><a href="?q=node/621">A1 Vox Ltd</a></li>
<li><a href="?q=node/622">AAA 3D Imaging</a></li>
<li><a href="?q=node/65">Aardman Animations Ltd</a></li>
<li><a href="?q=node/625">Aardvark Swift Recruitment Ltd</a></li>
<li><a href="?q=node/626">AB Facility Vehicles</a></li>
<li><a href="?q=node/627">Abacus Film Productions Ltd</a></li>
<li><a href="?q=node/628">Abbey Home Media Group</a></li>
<li><a href="?q=node/629">About-Face Media Productions</a></li>
<li><a href="?q=node/630">Absolute Post</a></li>
<li><a href="?q=node/631">Absolute Studios</a></li>
<li><a href="?q=node/632">Absolutely Productions</a></li>
<li><a href="?q=node/633">Abstract Images</a></li>
<li><a href="?q=node/634">Acacia Productions Ltd</a></li>
<li><a href="?q=node/558">Academy</a></li>
<li><a href="?q=node/635">Academy Billiards</a></li>
<li><a href="?q=node/636">AccessMocap</a></li>
<li><a href="?q=node/637">Account - 4</a></li>
<li><a href="?q=node/638">ACE Accounting Ltd</a></li>
</ul>
</div>
</div>
<div id="pager" class="container-inline"><div class="pager-first"> </div><div class="pager-previous"><div class="pager-first"> </div></div><div class="pager-list"><strong>1</strong> <div class="pager-next"><a href="?q=business/search_data&from=50">2</a></div> <div class="pager-next"><a href="?q=business/search_data&from=100">3</a></div> <div class="pager-next"><a href="?q=business/search_data&from=150">4</a></div> <div class="pager-next"><a href="?q=business/search_data&from=200">5</a></div> <div class="pager-next"><a href="?q=business/search_data&from=250">6</a></div> <div class="pager-next"><a href="?q=business/search_data&from=300">7</a></div> <div class="pager-next"><a href="?q=business/search_data&from=350">8</a></div> <div class="pager-next"><a href="?q=business/search_data&from=400">9</a></div> <div class="pager-list-dots-right">...</div></div><div class="pager-next"><a href="?q=business/search_data&from=50">next page</a></div><div class="pager-last"><a href="?q=business/search_data&from=2950">last page</a></div></div><!-- end content -->
as you can see there are a number of div tags and in particular text that reads <!-- begin content --> and <!-- end content -->
I want the hrefs and the the href text. I have thought that maybe some straight string maniplation might do the job splitting the text into parts. I have also been thinking that their might be a way to just get the ul html control directly.
Any help in the right direction would be appreciated.