Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to take rows containing a pattern in certain column and move them to the rows than have few commonly columns in UNIX

Tags:

bash

unix

awk

I have a big tab file like this:

rs980   INPP5B  missense        I       749     NP_001284363.1
rs980   INPP5B  reference       L       749     NP_001284363.1
rs980   INPP5B  missense        F       749     NP_001284363.1
rs980   INPP5B  missense        I       913     NP_005531.2
rs980   INPP5B  reference       L       913     NP_005531.2
rs980   INPP5B  missense        P       913     NP_005531.2
rs1921  ISG15   reference       S       83      NP_005092.1
rs1921  ISG15   missense        N       83      NP_005092.1
rs1921  ISG15   missense        T       83      NP_005092.1
rs2607  HEATR1  missense        K       2133    NP_060542.4
rs2607  HEATR1  missense        Q       2133    NP_060542.4
rs2607  HEATR1  reference       E       2133    NP_060542.4

and I would like take all rows with 'reference' pattern and move one column of them right after the rows without reference pattern and moreover that have the commonly strings in first, fifth and sixth columns. So the output should be:

rs980   INPP5B  missense        I       749     NP_001284363.1     L
rs980   INPP5B  missense        F       749     NP_001284363.1     L
rs980   INPP5B  missense        I       913     NP_005531.2        L
rs980   INPP5B  missense        P       913     NP_005531.2        L
rs1921  ISG15   missense        N       83      NP_005092.1        S
rs1921  ISG15   missense        T       83      NP_005092.1        S
rs2607  HEATR1  missense        K       2133    NP_060542.4        E
rs2607  HEATR1  missense        Q       2133    NP_060542.4        E

And so on... I got it by an array with awk through two files in whose I split the references from the others but is necessary do it in the same file in bash. Are there any possibility to do this in this more simple way? Thanks in advance.

Yes @Inian. I perform two files, separating references and others. In this way I do by awk the following command something like

awk 'BEGIN {FS=OFS="\t"} NR==FNR {h[$2] = $1; next} {print $1,$2,$3,$4,$5,$6,h[$4] file_1 file_2 > output_file

In which join together the columns than must be commonly in both files and then I create an array by awk to relate these columns with the other column in references that I want to print in the output file. In summary one file is like:

INPP5B  missense        I       rs980;749;NP_001284363.1
INPP5B  missense        F       rs980;913NP_005531.2

and the references:

INPP5B  reference       L       rs980;749;NP_001284363.1

and so on and I apply

awk 'BEGIN {FS=OFS="\t"} NR==FNR {h[$4] = $3; next} {print $1,$2,$3,$4,h[$4]}' file_others file_reference > output_file

This is a brief summary of how I get it, but I need to do in the same original file @Inian. Thanks in advance

like image 378
Perceval Vellosillo Gonzalez Avatar asked Dec 06 '25 09:12

Perceval Vellosillo Gonzalez


1 Answers

Assuming "file" contains all the lines, including the references (your "big tab file")

awk -F '\t' -v OFS='\t' '
    {key = $1 FS $2 FS $5 FS $6} 
    NR == FNR {
        if ($3 == "reference") val[key] = $4
        next
    }
    $3 != "reference" {print $0, val[key]}
' file file

Here, we process the file twice, once to collect the references, and then to print the information.

Output:

rs980   INPP5B  missense        I       749     NP_001284363.1  L
rs980   INPP5B  missense        F       749     NP_001284363.1  L
rs980   INPP5B  missense        I       913     NP_005531.2     L
rs980   INPP5B  missense        P       913     NP_005531.2     L
rs1921  ISG15   missense        N       83      NP_005092.1     S
rs1921  ISG15   missense        T       83      NP_005092.1     S
rs2607  HEATR1  missense        K       2133    NP_060542.4     E
rs2607  HEATR1  missense        Q       2133    NP_060542.4     E
like image 90
glenn jackman Avatar answered Dec 09 '25 00:12

glenn jackman