I have the following text input:
lorem <a> ipsum <b> dolor <c> sit amet,
consectetur <d> adipiscing elit <e>, sed
do eiusmod <f> tempor
incididunt ut
As seen in the text, the appearances of <?> is not fixed and can appear 0 or multiple times on the same line.
Only using awk I need to output this:
<a> <b> <c>
<d> <e>
<f>
I tried this awk script:
awk '{
match($0,/<[^>]+>/,a); // fill array a with matches
for (i in a) {
if (match(i, /^[0-9]+$/) != 0) // ignore non numeric indices
print a[i]
}
}' somefile.txt
but this only outputs the first match on every line:
<a>
<d>
<f>
Is there some way of doing this with match() or any other built-in function?
With GNU awk you could use its OOTB variable named FPAT and could try following awk code.
awk -v FPAT='<[^>]*>' '
NF{
val=""
for(i=1;i<=NF;i++){
val=(val?val OFS:"") $i
}
print val
}
' Input_file
Assuming there are no stray angle brackets, use either < or > as a field separator and print every second field:
awk -F'[<>]' '{for (i=2; i <= NF; i += 2) {printf "<%s> ", $i}; print ""}' data
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With