cat input
aaa paul peter
bbb john mike
ccc paul mike
bbb paul john
And my dictionary file dict:
cat dict
aaa OOO
bbb 111
ccc 222
I need to find string form input and if match first column in file dict, print second column form file dict to first column file input. I can use sub and gsub, but I have thousands row in dict file (with different letters).
cat output:
000 paul peter
111 john mike
222 paul mike
111 paul john
Thank you for any help.
My solution:
awk:
awk '{sub(/aaa/,"000",$1); sub(/bbb/,"111",$1); sub(/ccc/,"222",$1)1' input
UPDATE:
If not found match from input in dict, keep the word in first column unchanged.
cat input
aaa paul peter
bbb john mike
ccc paul mike
bbb paul john
ddd paul peter
cat dict
aaa OOO
bbb 111
ccc 222
cat output:
000 paul peter
111 john mike
222 paul mike
111 paul john
ddd paul peter
A more generalized approach as suggested by fedorqui in comments for handling mismatch in the names in the input and dict files can be done something as,
awk 'FNR==NR {dict[$1]=$2; next} {$1=($1 in dict) ? dict[$1] : $1}1' dict input
My original solution below works on the cases when there is no missed mappings between the input and the dict files.
awk 'FNR==NR{hash[$2FS$3]=$1; next}{for (i in hash) if (match(hash[i],$1)){print $2, i} }' input dict
OOO paul peter
111 john mike
111 paul john
222 paul mike
The idea is to create a hash-map with index as $2FS$3 and value as $1, i.e. hash["paul peter"]="aaa", etc. Once this is constructed, now the dictionary file is looked upon to see matching lines from $1 in dict with hash value from input file. If the match is found printing the contents as needed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With