Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to print the row number and starting location of a pattern when multiple matches per row are present?

Tags:

bash

split

awk

I want to use awk to match all the occurrences of a pattern within a large file. For each match, I would like to print the row number and the starting position of the pattern along the row (sort of xy coordinates). There are several occurrences of the pattern in each line. I found this somewhat related question.

So far, I managed to do it only for the first (leftmost) occurrence in each line. As an example:

echo xyzABCdefghiABCdefghiABCdef | awk 'match($0, /ABC/) {print NR, RSTART } ' 

The resulting output is :

1 4

But what I would expect is something like this:

1 4
1 13
1 22

I tried using split instead of match. I manage to identify all the occurrences, but the RSTART is lost and printed as "0".

echo xyzABCdefghiABCdefghiABCdef | awk ' { split($0,t, /ABC/,m) ; for (i=1; i in m; i++) print (NR, RSTART) } '

Output:

1 0
1 0
1 0

Any advice would be appreciated. I am not limited to using awk but a awk solution would be appreciated. Also, in my case the pattern to match would be a regex (/A.C/). Thank you

like image 551
RicGGG Avatar asked Dec 05 '25 06:12

RicGGG


1 Answers

Another option using gnu awk could be using split with a regex.

Using the split function, the 3rd field is the fieldsep array and the 4th field is the seps array which you can both use to calculate the positions.

echo xyzABCdefghiABCdefghiABCdef | 
awk ' { 
  n=split($0, a, /ABC/, seps); pos=1
  for(i=1; i<n; i++){
    pos += length(a[i])
    print NR, pos
    pos += length(seps[i])
  } 
}'

Output

1 4
1 13
1 22
like image 112
The fourth bird Avatar answered Dec 07 '25 02:12

The fourth bird



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!