Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

split line into multiple via splitting specific field

Tags:

bash

split

sed

awk

I have multiple lines like:

"390";"902";"from 4670000 to 4679999, from 4680000 to 4689999, from 9960000 to 9969999";"something1";"something2";"20.09.04"
"390";"903";"from 0770000 to 0779999, from 9170000 to 9179999";"something3";"something4";"09.09.04"

What I need is:

"390";"902";"from 4670000 to 4679999";"something1";"something2";"20.09.04"
"390";"902";"from 4680000 to 4689999";"something1";"something2";"20.09.04"
"390";"902";"from 9960000 to 9969999";"something1";"something2";"20.09.04"
"390";"903";"from 0770000 to 0779999";"something3";"something4";"09.09.04"
"390";"903";"from 9170000 to 9179999";"something3";"something4";"09.09.04"

As you can see I need to split variable3 on from/to tag (NOTE there is a space sometimes between ",").

Ideally, I need resulting output:

"390";"902";"4670000";"4679999";"something1";"something2";"20.09.04"
"390";"902";"4680000";"4689999";"something1";"something2";"20.09.04"
"390";"902";"9960000";"9969999";"something1";"something2";"20.09.04"
"390";"903";"0770000";"0779999";"something3";"something4";"09.09.04"
"390";"903";"9170000";"9179999";"something3";"something4";"09.09.04"

I've already found out I can split via awk, but I'm not sure how to copy rest of the line:

awk -F\, '{                       
  for (i = 0; ++i <= NF;)
    print i, $i
  }' <<<'from 4670000 to 4679999, from 4680000 to 4689999, from 9960000 to 9969999'
1 from 4670000 to 4679999
2  from 4680000 to 4689999
3  from 9960000 to 9969999

Sorry, this is my first question here, feel free to point me how should I correct it in order to get it fully answered.

Thanks!

like image 355
stackexch Avatar asked Dec 06 '25 03:12

stackexch


2 Answers

For an input of:

"390";"902";"from 4670000 to 4679999, from 4680000 to 4689999, from 9960000 to 9969999";"something1";"something2";"20.09.04"
"390";"903";"from 0770000 to 0779999, from 9170000 to 9179999";"something3";"something4";"09.09.04"

This code

#!/usr/bin/awk -f

BEGIN {
    FS = ";"
}

{
    t = $3
    gsub(/"/, "", t)
    n = split(t, a, /, /)
    for (i = 1; i <= n; ++i) {
        print $1 ";" $2 ";\"" a[i] "\";" $4 ";" $5 ";" $6
    }
}

Would give

"390";"902";"from 4670000 to 4679999";"something1";"something2";"20.09.04"
"390";"902";"from 4680000 to 4689999";"something1";"something2";"20.09.04"
"390";"902";"from 9960000 to 9969999";"something1";"something2";"20.09.04"
"390";"903";"from 0770000 to 0779999";"something3";"something4";"09.09.04"
"390";"903";"from 9170000 to 9179999";"something3";"something4";"09.09.04"

Condensed form (I don't think it could really be called as a true "one-liner"):

awk -F ";" -- '{ t = $3; gsub(/"/, "", t); n = split(t, a, /, /); for (i = 1; i <= n; ++i) print $1 ";" $2 ";\"" a[i] "\";" $4 ";" $5 ";" $6 }'

And this code

#!/usr/bin/awk -f

BEGIN {
    FS = ";"
}

{
    t = $3
    gsub(/"|from /, "", t)
    n = split(t, a, /, | to /)
    for (i = 1; i <= n; i += 2) {
        print $1 ";" $2 ";\"" a[i] "\";\"" a[i + 1] "\";"$4 ";" $5 ";" $6
    }
}

Would give

"390";"902";"4670000";"4679999";"something1";"something2";"20.09.04"
"390";"902";"4680000";"4689999";"something1";"something2";"20.09.04"
"390";"902";"9960000";"9969999";"something1";"something2";"20.09.04"
"390";"903";"0770000";"0779999";"something3";"something4";"09.09.04"
"390";"903";"9170000";"9179999";"something3";"something4";"09.09.04"

Condensed form:

awk -F ";" -- '{ t = $3; gsub(/"|from /, "", t); n = split(t, a, /, | to /); for (i = 1; i <= n; i += 2) print $1 ";" $2 ";\"" a[i] "\";\"" a[i + 1] "\";"$4 ";" $5 ";" $6; }'

Script is tested with gawk, nawk and mawk.

like image 105
konsolebox Avatar answered Dec 09 '25 14:12

konsolebox


awk one-liner:

awk -F'";"' -v OFS='";"' '{n=split($3,a,/,\s*/);for(i=1;i<=n;i++){$3=a[i];print}}' file

outputs:

kent$  cat f
"390";"902";"from 4670000 to 4679999, from 4680000 to 4689999, from 9960000 to 9969999";"something1";"something2";"20.09.04"
"390";"903";"from 0770000 to 0779999, from 9170000 to 9179999";"something3";"something4";"09.09.04"

kent$  awk -F'";"' -v OFS='";"' '{n=split($3,a,/,\s*/);for(i=1;i<=n;i++){$3=a[i];print}}' f
"390";"902";"from 4670000 to 4679999";"something1";"something2";"20.09.04"
"390";"902";"from 4680000 to 4689999";"something1";"something2";"20.09.04"
"390";"902";"from 9960000 to 9969999";"something1";"something2";"20.09.04"
"390";"903";"from 0770000 to 0779999";"something3";"something4";"09.09.04"
"390";"903";"from 9170000 to 9179999";"something3";"something4";"09.09.04"

EDIT

if you want the from...to to be parsed too, still an awk oneliner:

awk -F'";"' -v OFS='";"' '{n=split($3,a,/,\s*/);for(i=1;i<=n;i++)
{$3=a[i];sub(/\s*to\s*/,"\";\"",$3);sub(/\s*from\s*/,"",$3);print}}' file

test with same input file:

kent$  awk -F'";"' -v OFS='";"' '{n=split($3,a,/,\s*/);for(i=1;i<=n;i++){$3=a[i];sub(/\s*to\s*/,"\";\"",$3);sub(/\s*from\s*/,"",$3);print}}' f                              
"390";"902";"4670000";"4679999";"something1";"something2";"20.09.04"
"390";"902";"4680000";"4689999";"something1";"something2";"20.09.04"
"390";"902";"9960000";"9969999";"something1";"something2";"20.09.04"
"390";"903";"0770000";"0779999";"something3";"something4";"09.09.04"
"390";"903";"9170000";"9179999";"something3";"something4";"09.09.04"
like image 24
Kent Avatar answered Dec 09 '25 15:12

Kent



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!