I have this test file.
[root@localhost ~]# cat f.txt "a aa" MM "bbb b" MM MM MM"b b " [root@localhost ~]#
I want to replace all space characters in the quotes, note, just in the quotes. All characters out of the quotes should not be touched. That is to say, what I want is something similar to:
"a_aa" MM "bbb__b" MM MM MM"b_b_"
Can this be implemented using sed?
Thanks,
This is an entirely non-trivial question.
This works replacing the first space inside quotes with underscore:
$ sed 's/\("[^ "]*\) \([^"]*"\)/\1_\2/g' f.txt
"a_aa"  MM  "bbb_ b"
MM    MM
MM"b_b "
$
For this example, where there are no more than two spaces inside any of the quotes, it is tempting to simply repeat the command, but it gives an incorrect result:
$ sed -e 's/\("[^ "]*\) \([^"]*"\)/\1_\2/g' \
>     -e 's/\("[^ "]*\) \([^"]*"\)/\1_\2/g' f.txt
"a_aa"_ MM  "bbb_ b"
MM    MM
MM"b_b_"
$
If your version of sed supports 'extended regular expressions', then this works for the sample data:
$ sed -E \
>    -e 's/^(([^"]*("[^ "]*")?)*)("[^ "]*) ([^"]*")/\1\4_\5/' \
>    -e 's/^(([^"]*("[^ "]*")?)*)("[^ "]*) ([^"]*")/\1\4_\5/' \
>    -e 's/^(([^"]*("[^ "]*")?)*)("[^ "]*) ([^"]*")/\1\4_\5/' \
>    f.txt
"a_aa"  MM  "bbb__b"
MM    MM
MM"b_b_"
$
You have to repeat that ghastly regex for every space within double quotes - hence three times for the first line of data.
The regex can be explained as:
Because of the start anchor, this has to be repeated once per blank...but sed has a looping construct, so we can do it with:
$ sed -E -e ':redo
>            s/^(([^"]*("[^ "]*")?)*)("[^ "]*) ([^"]*")/\1\4_\5/
>            t redo' f.txt
"a_aa"  MM  "bbb__b"
MM    MM
MM"b_b_"
$
The :redo defines a label; the s/// command is as before; the t redo command jumps to the label if there was any substitution done since the last read of a line or jump to a label.
Given the discussion in the comments, there are a couple of points worth mentioning:
The -E option applies to sed on MacOS X (tested 10.7.2).  The corresponding option for the GNU version of sed is -r (or --regex-extended).  The -E option is consistent with grep -E (which also uses extended regular expressions).  The 'classic Unix systems' do not support EREs with sed (Solaris 10, AIX 6, HP-UX 11).
You can replace the ? I used (which is the only character that forces the use of an ERE instead of a BRE) with *, and then deal with the parentheses (which require backslashes in front of them in a BRE to make them into capturing parentheses), leaving the script:
sed -e ':redo
        s/^\(\([^"]*\("[^ "]*"\)*\)*\)\("[^ "]*\) \([^"]*"\)/\1\4_\5/g
        t redo' f.txt
This produces the same output on the same input - I tried some slightly more complex patterns in the input:
"a aa"  MM  "bbb  b"
MM    MM
MM"b b "
"c c""d d""e  e" X " f "" g "
 "C C" "D D" "E  E" x " F " " G "
This gives the output:
"a_aa"  MM  "bbb__b"
MM    MM
MM"b_b_"
"c_c""d_d""e__e" X "_f_""_g_"
 "C_C" "D_D" "E__E" x "_F_" "_G_"
Even with BRE notation, sed supported the \{0,1\} notation to specify 0 or 1 occurrences of the previous RE term, so the ? version could be translated to a BRE using:
sed -e ':redo
        s/^\(\([^"]*\("[^ "]*"\)\{0,1\}\)*\)\("[^ "]*\) \([^"]*"\)/\1\4_\5/g
        t redo' f.txt
This produces the same output as the other alternatives.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With