Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unix Parse Varying Named Value into seperate rows

We are getting a varying length input file as mentioned below. The text is varying length.

Input file:

ID|Text
1|name1=value1;name3;name4=value2;name5=value5
2|name1=value1;name2=value2;name6=;name7=value7;name8=value8

The text here has named value pair as the content and it's of varying length. Please note that the name in the text column can contain a semi colon. We are trying to parse the input but we are not able handle it via AWK or BASH

Desired Output:

1|name1=value1
1|name3;name4=value2
1|name5=value5
2|name1=value1
2|name2=value2
2|name6=
2|name7=value7
2|name8=value8

The below snipped of code works for ID=2, but doesn't for ID=1

echo "2|name1=value1;name2=value2;name6=;name7=value7;name8=value8" | while IFS="|"; read id text;do dsc=`echo $text|tr ';' '\n'`;echo "$dsc" >tmp;done
cat tmp
2|name1=value1
2|name2=value2
2|name6=
2|name7=value7
2|name8=value8
echo "1|name1=value1;name3;name4=value2;name5=value5" | while IFS="|"; read id text;do dsc=`echo $text|tr ';' '\n'`;echo "$dsc" >tmp;sed -i "s/^/${id}\|/g" tmp;done
cat tmp
1|name1=value1
1|name3
1|name4=value2
1|name5=value5

Any help is greatly appreciated.

like image 781
seanarcher7 Avatar asked Nov 30 '25 08:11

seanarcher7


2 Answers

Could you please try following, written and tested with shown samples in GNU awk with new version of it. Since OP's awk version is old so if anyone having old version of awk then try changing it to awk --re-interval

awk '
BEGIN{
  FS=OFS="|"
}
FNR==1{ next }
{
  first=$1
  while(match($0,/(name[0-9]+;?){1,}=(value[0-9]+)?/)){
    print first,substr($0,RSTART,RLENGTH)
    $0=substr($0,RSTART+RLENGTH)
  }
}'  Input_file

Output will be as follows.

1|name1=value1
1|name3;name4=value2
1|name5=value5
2|name1=value1
2|name2=value2
2|name6=
2|name7=value7
2|name8=value8

Explanation: Adding detailed explanation for above(following is for explanation purposes only).

awk '                                        ##Starting awk program from here.
BEGIN{                                       ##Starting BEGIN section from here.
  FS=OFS="|"                                 ##Setting FS and OFS wiht | here.
}
FNR==1{ next }                               ##If line is first line then go next, do not print anything.
{
  first=$1                                   ##Creating first and setting as first field here.
  while(match($0,/(name[0-9]+;?){1,}=(value[0-9]+)?/)){
##Running while loop which has match which has a regex of matching name and value all mentioned permutations and combinations.
    print first,substr($0,RSTART,RLENGTH)    ##Printing first and sub string(currently matched one)
    $0=substr($0,RSTART+RLENGTH)             ##Saving rest of the line into current line.
  }
}' Input_file                                ##Mentioning Input_file name here.
like image 81
RavinderSingh13 Avatar answered Dec 02 '25 01:12

RavinderSingh13


Sample data:

$ cat name.dat
ID|Text
1|name1=value1;name3;name4=value2;name5=value5
2|name1=value1;name2=value2;name6=;name7=value7;name8=value8

One awk solution:

awk -F"[|;]" '                                                           # use "|" and ";" as input field delimiters
FNR==1 { next }                                                          # skip header line
       { pfx=$1 "|"                                                      # set output prefix to field 1 + "|"
         printpfx=1                                                      # set flag to print prefix

         for ( i=2 ; i<=NF ; i++ )                                       # for fields 2 to NF
             {
               if ( printpfx)     { printf "%s",   pfx  ; printpfx=0 }   # if print flag == 1 then print prefix and clear flag
               if ( $(i)  ~ /=/ ) { printf "%s\n", $(i) ; printpfx=1 }   # if current field contains "=" then print it, end this line of output, reset print flag == 1
               if ( $(i) !~ /=/ ) { printf "%s;",  $(i) }                # if current field does not contain "=" then print it and include a ";" suffix
             }
       }
' name.dat

The above generates:

1|name1=value1
1|name3;name4=value2
1|name5=value5
2|name1=value1
2|name2=value2
2|name6=
2|name7=value7
2|name8=value8
like image 33
markp-fuso Avatar answered Dec 01 '25 23:12

markp-fuso