Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Faster solution to compare files in bash

Tags:

linux

bash

sed

awk

file1:

chr1    14361   14829   NR_024540_0_r_DDX11L1,WASH7P_468
chr1    14969   15038   NR_024540_1_r_WASH7P_69
chr1    15795   15947   NR_024540_2_r_WASH7P_152
chr1    16606   16765   NR_024540_3_r_WASH7P_15
chr1    16857   17055   NR_024540_4_r_WASH7P_198

and file2:

NR_024540 11

I need find match file2 in file1 and print whole file1 + second column of file2

So ouptut is:

  chr1  14361   14829   NR_024540_0_r_DDX11L1,WASH7P_468 11
chr1    14969   15038   NR_024540_1_r_WASH7P_69 11
chr1    15795   15947   NR_024540_2_r_WASH7P_152 11
chr1    16606   16765   NR_024540_3_r_WASH7P_15 11
chr1    16857   17055   NR_024540_4_r_WASH7P_198 11

My solution is very slow in bash:

#!/bin/bash

while read line; do

c=$(echo $line | awk '{print $1}')
d=$(echo $line | awk '{print $2}')

grep $c file1 | awk -v line="$d" -v OFS="\t" '{print $1,$2,$3,$4"_"line}' >> output


 done < file2

I am prefer FASTER any bash or awk solution. Output can be modified, but need keep all the informations (order of column can be different).

EDIT:

Right now it looks like fastest solution according @chepner:

#!/bin/bash

while read -r c d; do

grep $c file1 | awk -v line="$d" -v OFS="\t" '{print $1,$2,$3,$4"_"line}' 

done < file2 > output
like image 838
Geroge Avatar asked Nov 29 '25 14:11

Geroge


1 Answers

In a single Awk command,

awk 'FNR==NR{map[$1]=$2; next}{ for (i in map) if($0 ~ i){$(NF+1)=map[i]; print; next}}' file2 file1

chr1 14361 14829 NR_024540_0_r_DDX11L1,WASH7P_468 11
chr1 14969 15038 NR_024540_1_r_WASH7P_69 11
chr1 15795 15947 NR_024540_2_r_WASH7P_152 11
chr1 16606 16765 NR_024540_3_r_WASH7P_15 11
chr1 16857 17055 NR_024540_4_r_WASH7P_198 11

A more readable version in a multi-liner

FNR==NR {
    # map the values from 'file2' into the hash-map 'map'
    map[$1]=$2
    next
}
# On 'file1' do
{
    # Iterate through the array map
    for (i in map){
        # If there is a direct regex match on the line with the 
        # element from the hash-map, print it and append the 
        # hash-mapped value at last
        if($0 ~ i){
            $(NF+1)=map[i]
            print
            next
        }
    }
}
like image 85
Inian Avatar answered Dec 02 '25 03:12

Inian



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!