Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing cdata in simplehtmldom

Tags:

php

Hello good day I am trying to scrape an xml feed that was given to us, I am using simple htmldom to scrape it but some contents have cdata, how can I remove it?

<date>
<weekday>
<![CDATA[ Friday
]]> 
</weekday>
</date>

php

<?php     
<?php 
include('simple_html_dom.php'); 
include ('phpQuery.php'); 
if (ini_get('allow_url_fopen'))
$xml  = file_get_html('http://www.link.com/url.xml'); }
else{       $ch = curl_init('http://www.link.com/url.xml');
curl_setopt  ($ch, CURLOPT_HEADER, false);        
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);           
$src = curl_exec($ch);           
$xml = str_get_html($src, false);  }   
?>
<?php 
foreach($xml->find('weekday') as $e)
echo $e->innertext  . '<br>';
?>

I believe by default simplehtmldom removes the cdata but for some reason it doesn't work.

Kindly tell me if you need any info that would be helpful to solve this issue

Thank you so much for your help

like image 307
cooldude Avatar asked Jun 30 '26 11:06

cooldude


1 Answers

You can make use of another xml parser that is able to convert cdata into a string (Demo):

$innerText = '<![CDATA[ Friday
]]>';

$innerText = (string) simplexml_load_string("<x>$innerText</x>"));

Extended code-example based on OP's code

# [...]
<?php 
foreach($xml->find('weekday') as $e)
{
    $innerText = $e->innertext;
    $innerText = (string) simplexml_load_string("<x>$innerText</x>");
    echo $innerText . '<br>';
}
?>

Usage instructions: Locate the line which contains the foreach and then compare the original code with the new code (only the foreach in question has been replaced).

like image 106
hakre Avatar answered Jul 03 '26 02:07

hakre



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!