I require to get the content between 2 directives (embed and endembed) using RegEx. My current pattern does this correctly /(?<!\w)(\s*)@embed(\s*\(.*\))([\w\W]*?)@endembed/g.
However, when the directives are nested it does not matches the blocks correctly. https://regex101.com/r/nL8gV5/2,
@extends('layouts/default')
@section('content')
<div class="row">
<div class="col-md-6">
@embed('components/box')
@section('title', 'Box title')
@section('content')
<h4>Haai</h4>
Box content
@stop
@endembed
</div>
<div class="col-md-6">
@embed('components/box')
@section('title', 'Box2 title')
@section('content')
@embed('components/timeline')
@section('items')
@stop
@endembed
@stop
@endembed
</div>
</div>
@stop
Desired output:
1:
@section('title', 'Box title')
@section('content')
<h4>Haai</h4>
Box content
@stop
2:
@section('title', 'Box2 title')
@section('content')
@embed('components/timeline')
@section('items')
@stop
@endembed
@stop
3:
@section('items')
@stop
I've tried various patterns but i can't seem to get it right. It is in my understanding that i should use the (R?) recursive token combined with a backreference? something more like this https://regex101.com/r/nL8gV5/3. After spending several hours fiddling around, i still haven't got it working.
What am i doing wrong and what is the correct pattern?
To capture the outer @embed and nested ones, use recursive regex:
$pattern = '/@embed\s*\([^)]*\)((?>(?!@(?:end)?embed).|(?0))*)@endembed/s';
At (?0) the pattern is pasted. See test at regex101. Replace with captured $1 while matching out:
$res = array();
while (preg_match_all($pattern, $str, $out)) {
$str = preg_replace($pattern, "$1", $str);
$res = array_merge($res, $out[1]);
}
This will give you the outer and nested ones up to the innermost. Test at eval.in
The basic recursive pattern without any capturing is as simple as this:
/@embed\b(?>(?!@(?:end)?embed\b).|(?0))*@endembed/s
@embed followed by \b word boundary(?> Using a non capturing atomic group for alternation:(?!@(?:end)?embed). A character that starts not @embed or @endembed
|(?0) OR paste the pattern from start. )* The whole thing any amount of times.@endembedUsing s (PCRE_DOTALL) flag for making the dot also match newlines
I came up with this recursive regex from an example I had (from this stackoverflow answer):
(?=(@embed(?:(?>(?:(?!@embed|@endembed).)+)*|(?1))*@endembed))
Try it on regex101
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With