I wanted to replace each individual word present in 5 columns of a row with another word.
Here is my file (each column is separated by a tab, and when a column has multiple entries same as to each other they are separated by , (comma) like "A_V,A_V,A_V,A_V" ).
g1 A_chrococcum_B3__ACG10_RS21915 A_chrococcum_NCIMB8003__Achr_RS24720 "A_salinestris__GCU53_RS00995,A_salinestris__GCU53_RS13820,A_salinestris__GCU53_RS25085,A_salinestris__GCU53_RS00050,A_salinestris__GCU53_RS24715" "A_vinelandii_CA__AVCA_RS25530,A_vinelandii_CA__AVCA_RS00340,A_vinelandii_CA__AVCA_RS07835,A_vinelandii_CA__AVCA_RS09930,A_vinelandii_CA__AVCA_RS10910,A_vinelandii_CA__AVCA_RS11470,A_vinelandii_CA__AVCA_RS15230,A_vinelandii_CA__AVCA_RS21030,A_vinelandii_CA__AVCA_RS13765,A_vinelandii_CA__AVCA_RS06150,A_vinelandii_CA__AVCA_RS20865" "A_vinelandii_DJ__AVIN_RS25600,A_vinelandii_DJ__AVIN_RS00380,A_vinelandii_DJ__AVIN_RS07870,A_vinelandii_DJ__AVIN_RS09960,A_vinelandii_DJ__AVIN_RS10940,A_vinelandii_DJ__AVIN_RS11500,A_vinelandii_DJ__AVIN_RS15260,A_vinelandii_DJ__AVIN_RS06190,A_vinelandii_DJ__AVIN_RS13795,A_vinelandii_DJ__AVIN_RS20895"
The first column has the value with which I wanted to replace all other columns' values individually.
I'm looking for output like this (first col (string with I wanted to replace)). After first col COL-1 has only one occurrence therefore only one g1, COL-2 also has only one occurrence therefore only one g1, COL-3 has 5 occurrences, therefore, five g1, COL-4 has 11 occurrences therefore 11 g1 and so on.
g1 g1 g1 "g1,g1,g1,g1,g1" "g1,g1,g1,g1,g1,g1,g1,g1,g1,g1,g1" "g1,g1,g1,g1,g1,g1,g1,g1,g1,g1"
I tried to do this for the first row and thought that I can loop it for the rest of the file. Also because I don't know how to do it for all the columns at once.
the command which I was trying:
grep -w "g1" f1 |
awk -F"\t" '{ gsub("A_.*,","g1",$4); print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6}'
It gave me results like this:
g1 A_chrococcum_B3__ACG10_RS21915 A_chrococcum_NCIMB8003__Achr_RS24720 "g1A_salinestris__GCU53_RS24715" "A_vinelandii_CA__AVCA_RS25530,A_vinelandii_CA__AVCA_RS00340,A_vinelandii_CA__AVCA_RS07835,A_vinelandii_CA__AVCA_RS09930,A_vinelandii_CA__AVCA_RS10910,A_vinelandii_CA__AVCA_RS11470,A_vinelandii_CA__AVCA_RS15230,A_vinelandii_CA__AVCA_RS21030,A_vinelandii_CA__AVCA_RS13765,A_vinelandii_CA__AVCA_RS06150,A_vinelandii_CA__AVCA_RS20865" "A_vinelandii_DJ__AVIN_RS25600,A_vinelandii_DJ__AVIN_RS00380,A_vinelandii_DJ__AVIN_RS07870,A_vinelandii_DJ__AVIN_RS09960,A_vinelandii_DJ__AVIN_RS10940,A_vinelandii_DJ__AVIN_RS11500,A_vinelandii_DJ__AVIN_RS15260,A_vinelandii_DJ__AVIN_RS06190,A_vinelandii_DJ__AVIN_RS13795,A_vinelandii_DJ__AVIN_RS20895"
How to do it? I have 677779 rows in my file.
You may use this awk:
awk 'BEGIN {FS=OFS="\t"} {for (i=2; i<=NF; ++i) gsub(/[^",]+/, $1, $i)} 1' file
g1 g1 g1 "g1,g1,g1,g1,g1" "g1,g1,g1,g1,g1,g1,g1,g1,g1,g1,g1" "g1,g1,g1,g1,g1,g1,g1,g1,g1,g1"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With