Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split fields respecting only the left-most separator?

Tags:

bash

awk

I am attempting to implement selected records reformatting in Bash with AWK as a natural first pick for the job:

#!/bin/bash

process() {
    declare payload="$1"
    declare -a keysarr=("${@:2}")

    (
        IFS=$'|'
        awk 'BEGIN {FS="="; OFS="|"; ORS="~~~"} /^('"${keysarr[*]}"')/ {print $1,$2}'
    ) <<< "$payload"
}

declare sample
read -r -d '' sample <<'EOF'
Field1=all=1;is=2;one=3;field=4
Field2=nothing special
Field3=more of the same
Field4=not interested
EOF

process "$sample" Field1 Field2 Field3

The sample consists of 4 records only, with Field{1-4} being "keys" and the rest of the line following first = is the corresponding value, i.e. key->value:

Field1 -> all=1;is=2;one=3;field=4

This should be reformatted and key/value separated with |, records separated with ~~~. The working part is to select out only specific records based on keys (i.e. Field{1-3} in the example).

What is not working is that with AWK's FS defined as =, the rest of the line is further split, which is unwanted. The above now returns:

Field1|all~~~Field2|nothing special~~~Field3|more of the same~~~

Desired output would be (the full first record):

Field1|all=1;is=2;one=3;field=4~~~Field2|nothing special~~~Field3|more of the same~~~

Is there any simple tweak possible or is AWK the wrong tool for this job?

NOTE: No gawk, e.g. cannot use FPAT.

like image 402
Albert Camu Avatar asked Sep 02 '25 09:09

Albert Camu


2 Answers

You may use this awk with sub function and ORS:

awk -F= -v fld='Field1 Field2 Field3' -v ORS='~~' '
   index(" " fld " ", " " $1 " ") {sub(/=/, "|"); print} END{printf "\n"}' file

Field1|all=1;is=2;one=3;field=4~~Field2|nothing special~~Field3|more of the same~~

Here:

  • index(" " fld " ", " " $1 " ") searches for the first field separated by = in the command line argument fld (both are appended with spaces to make sure only full field name is matched)
like image 74
anubhava Avatar answered Sep 05 '25 00:09

anubhava


With any POSIX awk:

$ awk -v desired='Field1=Field2=Field3' '
  BEGIN {FS = "="; ORS="~~~"; split(desired, a); for(i in a) b[a[i]] = 1}
  $1 in b {sub(FS, "|"); print}' <<!
Field1=all=1;is=2;one=3;field=4
Field2=nothing special
Field3=more of the same
Field4=not interested
!
Field1|all=1;is=2;one=3;field=4~~~Field2|nothing special~~~Field3|more of the same~~~

The BEGIN block sets the Field Separator (FS) to =, the Output Record Separator (ORS) to ~~~, and computes an array b in which the keys are the desired FiledN values. The $1 in b filter selects the relevant input lines. sub substitutes the first FS with a |.

Note that the desired fields are passed separated by =. This makes sense as it is apparently your input field separator, so = it probably never found inside the desired fields. A bonus side effect is that split(desired, a) by default uses FS as separator too. Moreover, you could have any desired fields, including with spaces or tabs (e.g., Field1=Field1 Field2=Field2), as long as they do not contain = signs.

like image 34
Renaud Pacalet Avatar answered Sep 04 '25 22:09

Renaud Pacalet