Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to replace each individual occurrence of word with other word using awk or sed?

Tags:

shell

sed

awk

I wanted to replace each individual word present in 5 columns of a row with another word.

Here is my file (each column is separated by a tab, and when a column has multiple entries same as to each other they are separated by , (comma) like "A_V,A_V,A_V,A_V" ).

g1      A_chrococcum_B3__ACG10_RS21915  A_chrococcum_NCIMB8003__Achr_RS24720    "A_salinestris__GCU53_RS00995,A_salinestris__GCU53_RS13820,A_salinestris__GCU53_RS25085,A_salinestris__GCU53_RS00050,A_salinestris__GCU53_RS24715"     "A_vinelandii_CA__AVCA_RS25530,A_vinelandii_CA__AVCA_RS00340,A_vinelandii_CA__AVCA_RS07835,A_vinelandii_CA__AVCA_RS09930,A_vinelandii_CA__AVCA_RS10910,A_vinelandii_CA__AVCA_RS11470,A_vinelandii_CA__AVCA_RS15230,A_vinelandii_CA__AVCA_RS21030,A_vinelandii_CA__AVCA_RS13765,A_vinelandii_CA__AVCA_RS06150,A_vinelandii_CA__AVCA_RS20865"   "A_vinelandii_DJ__AVIN_RS25600,A_vinelandii_DJ__AVIN_RS00380,A_vinelandii_DJ__AVIN_RS07870,A_vinelandii_DJ__AVIN_RS09960,A_vinelandii_DJ__AVIN_RS10940,A_vinelandii_DJ__AVIN_RS11500,A_vinelandii_DJ__AVIN_RS15260,A_vinelandii_DJ__AVIN_RS06190,A_vinelandii_DJ__AVIN_RS13795,A_vinelandii_DJ__AVIN_RS20895"

The first column has the value with which I wanted to replace all other columns' values individually.

I'm looking for output like this (first col (string with I wanted to replace)). After first col COL-1 has only one occurrence therefore only one g1, COL-2 also has only one occurrence therefore only one g1, COL-3 has 5 occurrences, therefore, five g1, COL-4 has 11 occurrences therefore 11 g1 and so on.

g1      g1      g1      "g1,g1,g1,g1,g1"      "g1,g1,g1,g1,g1,g1,g1,g1,g1,g1,g1"      "g1,g1,g1,g1,g1,g1,g1,g1,g1,g1"

I tried to do this for the first row and thought that I can loop it for the rest of the file. Also because I don't know how to do it for all the columns at once.

the command which I was trying:

 grep -w "g1" f1 |
 awk -F"\t" '{ gsub("A_.*,","g1",$4); print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6}'

It gave me results like this:

g1      A_chrococcum_B3__ACG10_RS21915  A_chrococcum_NCIMB8003__Achr_RS24720    "g1A_salinestris__GCU53_RS24715"        "A_vinelandii_CA__AVCA_RS25530,A_vinelandii_CA__AVCA_RS00340,A_vinelandii_CA__AVCA_RS07835,A_vinelandii_CA__AVCA_RS09930,A_vinelandii_CA__AVCA_RS10910,A_vinelandii_CA__AVCA_RS11470,A_vinelandii_CA__AVCA_RS15230,A_vinelandii_CA__AVCA_RS21030,A_vinelandii_CA__AVCA_RS13765,A_vinelandii_CA__AVCA_RS06150,A_vinelandii_CA__AVCA_RS20865"  "A_vinelandii_DJ__AVIN_RS25600,A_vinelandii_DJ__AVIN_RS00380,A_vinelandii_DJ__AVIN_RS07870,A_vinelandii_DJ__AVIN_RS09960,A_vinelandii_DJ__AVIN_RS10940,A_vinelandii_DJ__AVIN_RS11500,A_vinelandii_DJ__AVIN_RS15260,A_vinelandii_DJ__AVIN_RS06190,A_vinelandii_DJ__AVIN_RS13795,A_vinelandii_DJ__AVIN_RS20895"

How to do it? I have 677779 rows in my file.

like image 456
vibhu sharma Avatar asked Jan 23 '26 19:01

vibhu sharma


1 Answers

You may use this awk:

awk 'BEGIN {FS=OFS="\t"} {for (i=2; i<=NF; ++i) gsub(/[^",]+/, $1, $i)} 1' file

g1 g1 g1 "g1,g1,g1,g1,g1" "g1,g1,g1,g1,g1,g1,g1,g1,g1,g1,g1" "g1,g1,g1,g1,g1,g1,g1,g1,g1,g1"
like image 84
anubhava Avatar answered Jan 26 '26 00:01

anubhava



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!