I am trying to get the text from a page scrape using xpath, now I keep getting an error returned and no idea why! - bare in mind I am a very new php user, this is for a university project that I've taken on and its prooving to be very challenging :P but hey it should be.
Heres the code,
<?php
$html = file_get_contents('http://www.amazon.co.uk/New-Apple-iPod-touch-Generation/dp/B0040GIZTI/ref=br_lf_m_1000333483_1_1_img?ie=UTF8&s=electronics&pf_rd_p=229345967&pf_rd_s=center-3&pf_rd_t=1401&pf_rd_i=1000333483&pf_rd_m=A3P5ROKL5A1OLE&pf_rd_r=1ZW9HJW2KN2C2MTRJH60');
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
$in_stock = $xpath->query("/html/body/div[@id='divsinglecolumnminwidth']/form[@id='handleBuy']/table[3]/tbody/tr[3]/td/div/span");
?>
I get the following error...
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Unexpected end tag : head in Entity, line: 2664 in C:\xampp\htdocs\scraping\domxpath.php on line 19
About a hundred times!
Any help really appreciated! , it must be really easy to fix :P
Just put this line first in your code to stop displaying errors, this is particularly helpful when your document is an HTML page and if you don't know if it is a well formed XML doc .
libxml_use_internal_errors(true);
https://www.php.net/manual/fr/function.libxml-use-internal-errors.php
$xpath = new DOMXpath($dom);
$expr = "/html/body/div[@id='divsinglecolumnminwidth']/form[@id='handleBuy']/table[3]/tr[3]/td/div/span";
$nodes = $xpath->query($expr); // returns DOMNodeList object
// you can check length property i.e. $nodes->length
echo $nodes->item(0)->nodeValue; // get first DOMNode object and its value
Also you need to add stametent for suppressing errors. I think that for performance reasons it's better to use absolute XPath expression, but relative //form[@id='handleBuy']/table[3]/tr[3]/td/div/span works as well and is more elastic.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With