Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex, check to exclude if strings have many smilar html tags

I'm trying to check if this embed vimeo iframe:

<iframe src="https://player.vimeo.com/video/800711372?h=589188fdd4&title=0&byline=0&portrait=0" width="640" height="360" frameborder="0" allow="autoplay; fullscreen; picture-in-picture" allowfullscreen></iframe>

is appearing exactly once in a string. That means:

<iframe.....></iframe><iframe.....></iframe> (doesnt match)
<iframe.....></iframe> (match)

won't match. I used this pattern:

^(<iframe[^(src)]*?src\=\"https?\:\/\/player\.vimeo\.com\/video\/[^>]*?>)<\/iframe>$

and it works fine, but I'm just thinking it's not a very good idea. Is there any other way to achieve this? I did a little research and people say using lookahead, negative lookahead.

Edit: Oh, the reason my regex works is in my code. I removed all new lines before applying the regex. So if we have:

<iframe.....></iframe>
<iframe.....></iframe>
<iframe.....></iframe> (multiple, keep the line breaks)

my regex will match all.

like image 575
jpesa Avatar asked Dec 20 '25 14:12

jpesa


1 Answers

I don't believe this is an html question.
Assume you want to check a string for a single occurrence of a sub-string.
How's that done ? There are two good ways.

  1. Actively check before and after the occurrence of the sub-string.
    Regex engines don't give up trying to match. If you search for the sub-string then
    follow up with a negative assertion that it doesn't exist down stream, the engine
    will just match the last occurrence, which satisfies the assertion.
    Therefore a character by character check before and after, that there is only a single sub-string.
    This is fairly slow.

  2. Passively match 2 occurrences of the sub-string. Passively meaning un-greedy .*?
    matching the first sub-string, the un-greedy matching OPTIONALLY the second occurrence.
    The engine will try hard to match both occurrences. The second occurrence
    is within a capture group. This is a flag to be examined on a successful match.
    If that group is not NULL, the regex found 2 or more sub-strings.
    If that group is NULL, there is a 100% assurance there is only a single sub-string.

Note that if the regex matched it found at least a single sub-string.

Example:

(<iframe\s+(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>\s*</iframe>)(?:(?:[\S\s]*?(<iframe\s+(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>\s*</iframe>))|)

Failed, Group 2 is not NULL https://regex101.com/r/DmThDT/1
Passed, Group 2 is NULL https://regex101.com/r/393BPn/1

HTML should be parsed with some kind of html editor, however I believe theis question is not about that.
My attempt at htlm tags is thrown in but this could be anything.

Overview

(                             # (1 start)
   <iframe \s+ 
   (?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+
   > \s* </iframe>
)                             # (1 end)
(?:
   (?:
      [\S\s]*? 
      (                             # (2 start)
         <iframe \s+ 
         (?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+
         > \s* </iframe>
      )                             # (2 end)
   )
 | 
)
like image 107
sln Avatar answered Dec 23 '25 05:12

sln



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!