RegExp to ignore everything between and tags

Question

trying to parse some content (no DOM available - or DOM parser for that matter i.e. jQuery, Cheerio) to replace some words/symbols (basically emotions) by images, BUT would like to ignore everything in between <code></code> and <pre></pre> this example works great on replacing all the emotions, but doesn't ignore code and pre tags http://jsbin.com/odARehI/5/edit?js,console

if you run the script, you will see the first print out before the code tag and the second after.

would appreciate another set of eyes on that pattern. Thanks

// see link for a list of the emotions to parse
var pattern = /&gt;:\)|$[\w~]+$|$$:]?[od]/|[:;\|bBiIxX8\(\)$$][=\-"^:]?[)>$&|\w*@#?]?[)>$&|\w*@#?]/g;

I tried few things that didn't work without messing up the original match.

For the Don't-parse-html-with-regex-police-department: this is running server side and I do not have the luxury for a DOM parser at the moment.

Thank you.

UPDATE: for a RegExp solution to ignore <code> tags see this neat solution thanks to github/frissdiegurke in this commit

/(^|</code>)([^<]*|<(?!code>))*(<code>|$)/g

bluefeet · Accepted Answer

Without DOM parsing you are going to have edge cases which will fail. But, this should work for you.

Given this HTML:

Hello :) <pre>Wassup :)</pre> Maybe :) <code>:) Foo</code> :) Bar

Use this code:

var blocks = [];
html = html.replace(/(?:<pre>.*?<\/pre>|<code>.*?<\/code>)/g, function (match) {
    blocks.push( match );
    return '__BLOCK__';
});

html = html.replace(/:\)/g, 'SMILE');

html = html.replace(/__BLOCK__/g, function () {
    return blocks.shift();
});

Which produces:

Hello SMILE <pre>Wassup :)</pre> Maybe SMILE <code>:) Foo</code> SMILE Bar

Just adjust the /:\)/g replace to work however you need it.

RegExp to ignore everything between <code> and <pre> tags

Tags:

javascript

regex

nodebb

bentael

1 Answers

bluefeet

Recent Activity

Donate For Us