Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

replace multiple spaces, tabs and newlines into one space except commented text

I need replace multiple spaces, tabs and newlines into one space except commented text in my html. For example the following code:

<br/>    <br>

<!--
this   is a comment

-->
<br/>   <br/>

should turn into

<br/><br><!--
this   is a comment

--><br/><br/>

Any ideas?

like image 411
ro_jero Avatar asked Dec 13 '25 23:12

ro_jero


1 Answers

The new solution

After thinking a bit, I came up with the following solution with pure regex. Note that this solution will delete the newlines/tabs/multi-spaces instead of replacing them:

$new_string = preg_replace('#(?(?!<!--.*?-->)(?: {2,}|[\r\n\t]+)|(<!--.*?-->))#s', '$1', $string);
echo $new_string;

Explanation

(?                              # If
    (?!<!--.*?-->)              # There is no comment
        (?: {2,}|[\r\n\t]+)     # Then match 2 spaces or more, or newlines or tabs
    |                           # Else
        (<!--.*?-->)            # Match and group it (group #1)
)                               # End if

So basically when there is no comment it will try to match spaces/tabs/newlines. If it does find it then group 1 wouldn't exist and there will be no replacements (which will result into the deletion of spaces...). If there is a comment then the comment is replaced by the comment (lol).

Online demo


The old solution

I came up with a new strategy, this code require PHP 5.3+:

$new_string = preg_replace_callback('#(?(?!<!--).*?(?=<!--|$)|(<!--.*?-->))#s', function($m){
    if(!isset($m[1])){ // If group 1 does not exist (the comment)
        return preg_replace('#\s+#s', ' ', $m[0]); // Then replace with 1 space
    }
    return $m[0]; // Else return the matched string
}, $string);

echo $new_string; // Output

Explaining the regex:

(?                      # If
    (?!<!--)            # Lookahead if there is no <!--
        .*?             # Then match anything (ungreedy) until ...
        (?=<!--|$)      # Lookahead, check for <!-- or end of line
    |                   # Or
        (<!--.*?-->)    # Match and group a comment, this will make for us a group #1
)
# The s modifier is to match newlines with . (dot)

Online demo

Note: What you are asking and what you have provided as expected output are a bit contradicting. Anyways if you want to remove instead of replacing by 1 space, then just edit the code from '#\s+#s', ' ', $m[0] to '#\s+#s', '', $m[0].

like image 171
HamZa Avatar answered Dec 16 '25 23:12

HamZa



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!