<?php
$data = <<<DATA
<div>
<p>سلام</p> // focus on this line
<p class="myclass">Remove this one</p>
<p>But keep this</p>
<div style="color: red">and this</div>
<div style="color: red">and <p>also</p> this</div>
<div style="color: red">and this <div style="color: red">too</div></div>
</div>
DATA;
$dom = new DOMDocument();
$dom->loadHTML(mb_convert_encoding($data, 'HTML-ENTITIES', 'UTF-8'), LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//*[@*]") as $node) {
$parent = $node->parentNode;
while ($node->hasChildNodes()) {
$parent->insertBefore($node->lastChild, $node->nextSibling);
}
$parent->removeChild($node);
}
echo $dom->saveHTML();
As I've mentioned in the title of my question, the content of my website is Persian (not English). But code about doesn't work for Persian characters.
Current output:
.
.
<p>سلام</p>
.
.
Expected output:
.
.
<p>سلام</p>
.
.
What's wrong with it and how can I fix it?
Note: Also as you see I've used mb_convert_encoding($data, 'HTML-ENTITIES', 'UTF-8') to make it correct (based on this answer) but still it doesn't work.
The Persian characters are being encoded as numeric character references. They'll appear appropriately in a browser or you can see the original by decoding them with html_entity_decode(), e.g.:
echo html_entity_decode("سلام");
outputs:
سلام
If you prefer the original characters in the output rather than numeric character references, you can change:
echo $dom->saveHTML();
to:
echo $dom->saveHTML($dom->documentElement);
This alters the serialization a bit and the result is:
<div>
<p>سلام</p>
Remove this one
<p>But keep this</p>
and this
and <p>also</p> this
and this too
</div>
Example.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With