Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWK print all regex matches on every line

Tags:

awk

I have the following text input:

lorem <a> ipsum <b> dolor <c> sit amet,
consectetur <d> adipiscing elit <e>, sed 
do eiusmod <f> tempor
incididunt ut

As seen in the text, the appearances of <?> is not fixed and can appear 0 or multiple times on the same line.

Only using awk I need to output this:

<a> <b> <c>
<d> <e>
<f>

I tried this awk script:

awk '{
  match($0,/<[^>]+>/,a);           // fill array a with matches
  for (i in a) {
    if (match(i, /^[0-9]+$/) != 0) // ignore non numeric indices
      print a[i]
  }
}' somefile.txt

but this only outputs the first match on every line:

<a>
<d>
<f>

Is there some way of doing this with match() or any other built-in function?

like image 743
aee Avatar asked Dec 06 '25 05:12

aee


2 Answers

With GNU awk you could use its OOTB variable named FPAT and could try following awk code.

awk -v FPAT='<[^>]*>' '
NF{
  val=""
  for(i=1;i<=NF;i++){
    val=(val?val OFS:"") $i
  }
  print val
}
'  Input_file
like image 198
RavinderSingh13 Avatar answered Dec 08 '25 07:12

RavinderSingh13


Assuming there are no stray angle brackets, use either < or > as a field separator and print every second field:

awk -F'[<>]' '{for (i=2; i <= NF; i += 2) {printf "<%s> ", $i}; print ""}' data
like image 42
glenn jackman Avatar answered Dec 08 '25 08:12

glenn jackman



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!