Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular Expression for Replacing Content Not Inside HTML Tags

Tags:

regex

php

I've got a function that helps to interlink pages within my site by scanning blog entries, news, and other items for certain core keywords. It then replaces those keywords with a link to the corresponding page.

I'm running into a problem where some words that should not be replaced with links are. For example, I have a summary tag in a few of my HTML tables that contains a small summary of the table content. So for example, I might have a tag that looks like this:

<table width="500" cellspacing="0" cellpadding="4" border="0" summary="This table contains a list of all car parts in inventory along with their corresponding prices">
...
</table>

My function incorrectly replaces a keyword or phrase like "car parts" with a link. How can I structure my replacement regular expression to NOT replace it in cases like this, but DO replace it should it appear within a paragraph or even within a cell in an HTML table.

Thanks in advance for any help and guidance!

EDIT: Just to clarify, I'm using PHP to render my pages. I'm using a str_replace() before the content is output as HTML to the page. I want to be able to replace that with an ereg_replace() so that I replace the content only if it meets certain conditions (i.e. as explained above). Sorry if this caused any confusion!

like image 624
Dexter Avatar asked Feb 02 '26 20:02

Dexter


1 Answers

Don't use regexes to parse HTML. Use the PHP DOM:

$DOM = new DOMDocument;
$DOM->loadHTML($str); // Your HTML

//get all tds
$cells = $DOM->getElementsByTagName('td');

// Do stuff to the cells

//get all paragraphs
$paragraphs = $DOM->getElementsByTagName('p');

// Do stuff to the paragraphs

// Etc...
like image 70
Håvard S Avatar answered Feb 04 '26 09:02

Håvard S



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!