Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Search group in bin text with regex

Tags:

regex

I need to found the groups in a big text by knowing of:

  • Word that define the start of a group
  • Word contained in the group
  • Word that define the finish of group group

the start word is : begin the contained word is: 536916223 the finish word is: end

On the text , in the bottom, I need to find 2 groups..

I have tried to use:

\bbegin.*(\n*.*)*536916223(\n*.*)*\bbegin

but if I will be try to use the previous regex on the site "http://regexr.com/" it respond with timeout... and I think the regex is not very good :(

The text is:

begin active link
   export-version : 11
   actlink-order  : 2
   wk-conn-type   : 1
   schema-name    : HelpDesk
   actlink-mask   : 1
   actlink-control: 750000002
   enable         : 1
   action {
      set-field   : 0\536916222\101\4\1\1\
   }
   errhandler-name: 
end
begin active link
   export-version : 11
   actlink-order  : 2
   wk-conn-type   : 1
   schema-name    : HelpDesk
   actlink-mask   : 1
   actlink-control: 610000092
   enable         : 1
   permission     : 0
   action {
      id          : 536916223
      focus       : 0
      access-opt  : 1
      option      : 0
   }
   action {
      set-field   : 0\536916222\101\4\1\1\
   }
   errhandler-opt : 0
   errhandler-name: 
end
begin active link
   actlink-order  : 12
   wk-conn-type   : 1
   schema-name    : HelpDesk
   actlink-mask   : 2064
   enable         : 1
   permission     : 0
   action {
      id          : 536916223
      focus       : 0
      access-opt  : 1
      option      : 0
   }
   action {
      set-field   : 0\536916222\101\4\1\1\
   }
   errhandler-opt : 0
   errhandler-name: 
end

Can someone suggest me a optimize regex for this work?

Regards, Vincenzo

like image 602
Vincenzo Avatar asked Dec 05 '25 21:12

Vincenzo


1 Answers

Use an unrolled tempered greedy token:

/\bbegin.*(?:\n(?!begin|end(?:$|\n)).*)*\b536916223\b.*(?:\n(?!begin|end(?:$|\n)).*)*\nend/g

or a shorter version if we add MULTILINE modifier:

/^begin.*(?:\n(?!begin|end$).*)*\b536916223\b.*(?:\n(?!begin|end$).*)*\nend$/gm

See the regex demo (a version with MULTILINE modifier)

Details:

  • \bbegin - a word begin (a word boundary \b can be added after it for surer matches)
  • .* - the rest of the line after begin
  • (?:\n(?!begin|end(?:$|\n)).*)* - the unrolled tempered greedy token (?:(?!\n(?:begin|end(?:$|\n)))[\s\S])* matching any sequence but begin at the beginning of a line and end as a whole line
  • \b536916223\b - the whole word 536916223
  • .* - the rest of the line after the number
  • (?:\n(?!begin|end(?:$|\n)).*)* - another unrolled tempered greedy token
  • \nend - the end word after a newline (a (?:$|\n) can be added after it for surer matches)
like image 187
Wiktor Stribiżew Avatar answered Dec 08 '25 14:12

Wiktor Stribiżew