Fairly new to linux, I apologize.
I have a file as such:
1 C foo C bar
2 C foo C bar
3 C foo C bar
4 H foo H bar
5 H foo H bar
6 O foo O bar
And I need to get it to be:
1 C01 foo C bar
2 C02 foo C bar
3 C03 foo C bar
4 H01 foo H bar
5 H02 foo H bar
6 O01 foo O bar
**Unfortunately the spacing between foo and C as well as the spacing between C and bar must be maintained.
I have tried it in a piecewise manner, where I pull out lines containing the different identifiers, C, H, and O, placing them in a temp file. Then I attempt to order them by occurance, and then splice the original file back together.
#!/bin/bash
sed -i -e "/ C /w temp1.txt" -e "//d" File.txt
sed -i -e "/ H /w temp2.txt" -e "//d" File.txt
sed -i -e "/ O /w temp3.txt" -e "//d" File.txt
`awk -i '{print NR $2}' temp1.txt
awk -i '{print NR $2}' temp2.txt
awk -i '{print NR $2}' temp3.txt
cat temp1.txt >> File.txt
cat temp2.txt >> File.txt
cat temp3.txt >> File.txt
However I am pretty sure my syntax is awful, as I am really only familiar with sed rather than awk.
Any help would be greatly appreciated, thank you.
same solution while preserving the initial field positions
$ awk '{r=sprintf("%02d",++a[$2]); sub($2" ",$2r)}1' file
1 C01 foo C bar
2 C02 foo C bar
3 C03 foo C bar
4 H01 foo H bar
5 H02 foo H bar
6 O01 foo O bar
Note that this assumes first field values don't overlap with the second field values as shown, otherwise you need to guard to keep changes only to the second field. For second field it can be easily done by prefixing match and replacement values with single space.
EDIT: Here is a solution with GNU awk which preserves actual spaces. If your split supports 4 arguments. After reading man page I got it, even I am happy that I found it, it will be helpful.
awk '
{
n=split($0,array," ",b)
array[2]=sprintf("%s%02d",array[2],++a[array[2]])
line=b[0]
for(i=1;i<=n;i++){
line=(line array[i] b[i])
}
print line
}' Input_file
1 C01 foo C bar
2 C02 foo C bar
3 C03 foo C bar
4 H01 foo H bar
5 H02 foo H bar
6 O01 foo O bar
About split in GNU awk man page for 4 arguments:
split(s, a [, r [, seps] ]) Split the string s into the array a and the separators array seps on the regular expression r, and return thenumber of fields. If r is omitted, FS is used instead. The arrays a and seps are cleared first. seps[i] is the field separator matched by r between a[i] and a[i+1]. If r is a single space, then leading whitespace in s goes into the extra array element seps[0] and trailing white- space goes into the extra array element seps[n], where n is the return value of split(s, a, r, seps). Splitting behaves identically to field splitting, described above.
1st solution: Could you please try following,
awk '{$2=sprintf("%s%02d",$2,++a[$2])} 1' Input_file
Output will be as follows.
1 C01 bar C
2 C02 bar C
3 C03 bar C
4 H01 bar H
5 H02 bar H
6 O01 bar O
2nd solution: In case you want to have values in $2 and $4 both places then do following.
awk '{$2=$4=sprintf("%s%02d",$2,++a[$2])} 1' Input_file
1 C01 bar C01
2 C02 bar C02
3 C03 bar C03
4 H01 bar H01
5 H02 bar H02
6 O01 bar O01
3rd solution: In case you want to add/insert a new column at last of line then do following.
awk '{$(NF+1)=sprintf("%s%02d",$2,++a[$2])} 1' Input_file
1 C bar C C01
2 C bar C C02
3 C bar C C03
4 H bar H H01
5 H bar H H02
6 O bar O O01
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With