Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In bash how to transform multimap<K,V> to a map of <K, {V1,V2}>

Tags:

bash

mapreduce

I am processing output from a file in bash and need to group values by their keys.

For example, I have the

13,47099
13,54024
13,1
13,39956
13,0
17,126223
17,52782
17,4
17,62617
17,0
23,1022724
23,79958
23,80590
23,230
23,1
23,118224
23,0
23,1049
42,72470
42,80185
42,2
42,89199
42,0
54,70344
54,72824
54,1
54,62969
54,1

in a file and group all values from a particular key into a single line as in

13,47099,54024,1,39956,0
17,126223,52782,4,62617,0
23,1022724,79958,80590,230,1,118224,0,1049
42,72470,80185,2,89199,0
54,70344,72824,1,62969,1

There are about 10000 entries in my input file. How do I transform this data in shell ?

like image 544
Anoop Avatar asked Nov 30 '25 11:11

Anoop


2 Answers

awk to the rescue!

assuming keys are contiguous...

$ awk -F, 'p!=$1 {if(a) print a; a=p=$1} 
                 {a=a FS $2} 
           END   {print a}' file

13,47099,54024,1,39956,0                                                                                                                  
17,126223,52782,4,62617,0                                                                                                                 
23,1022724,79958,80590,230,1,118224,0,1049                                                                                                
42,72470,80185,2,89199,0                                                                                                                  
54,70344,72824,1,62969,1    
like image 158
karakfa Avatar answered Dec 03 '25 07:12

karakfa


Here is a breakdown of what @karakfa's code is doing, for us awk beginners. I've written this based on a toy dataset file:

1,X
1,Y
3,Z
  • p!=$1: check if the pattern p!=$1 is true
    • checks if variable p is equal to the first field of the current (first) line of file (1 in this case)
    • since p is undefined at this point it cannot be equal to 1, so p!=$1 is true and we continue with this line of code
  • if(a) print a: check if variable a exists and print a if it does exists
    • since a is undefined at this point the print a command is not executed
  • a=p=$1: set variables a and p equal to the value of the first field of the current (first) line (1 in this case)
  • a=a FS $2: set variable a equal to a combined with the value of the second field of the current (first) line separated by the field separator (1,X in this case)
  • END: since we haven't reached the end of file yet, we skip the the rest of this line of code
  • move to the next (second) line of file and restart the awk code on that line

  • p!=$1: check if the pattern p!=$1 is true

    • since p is 1 and the first field of the current (second) line is 1, p!=$1 is false and we skip the the rest of this line of code
  • a=a FS $2: set a equal to the value of a and the value of the second field of the current (second) line separated by the filed separator (1,X,Y in this case)
  • END: since we haven't reached the end of file yet, we skip the the rest of this line of code
  • move to the next (third) line of file and restart the awk code

  • p!=$1: check if the pattern p!=$1 is true

    • since p is 1 and $1 of the third line is 3, p!=$1 is true and we continue with this line of code
  • if(a) print a: check if variable a exists and print a if it does exists
    • since a is 1,X,Y at this point, 1,X,Y is printed to the output
  • a=p=$1: set variables a and p equal to the value of the first field of the current (third) line (3 in this case)
  • a=a FS $2: set variable a equal to a combined with the value of the second field of the current (third) line separated by the field separator (3,Z in this case)
  • END {print a}: since we have reached the end of file, execute this code
    • print a: print the last group a (3,Z in this case)

The resulting output is

1,X,Y
3,Z

Please let me know if there are any errors in this description.

like image 34
Josh Avatar answered Dec 03 '25 09:12

Josh



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!