Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Join two files including unmatched lines in Shell

File1.log

207.46.13.90  37556
157.55.39.51  34268
40.77.167.109 21824
157.55.39.253 19683

File2.log

207.46.13.90  62343
157.55.39.51  58451
157.55.39.200 37675
40.77.167.109 21824

Below should be expected Output.log

207.46.13.90    37556   62343
157.55.39.51    34268   58451
157.55.39.200   -----   37675
40.77.167.109   21824   21824
157.55.39.253   19683   -----

I tried with the below 'join' command - but it skips the missing line

join --nocheck-order File1.log File2.log

outputting like below (not as expected)

207.46.13.90  37556 62343
157.55.39.51  34268 58451
40.77.167.109 21824 21824

Could someone please help with the proper command for the desired output. Thanks in advance

like image 951
Arivazhagan Avatar asked Sep 20 '25 11:09

Arivazhagan


2 Answers

Could you please try following.

awk '
FNR==NR{
  a[$1]=$2
  next
}
($1 in a){
  print $0,a[$1]
  b[$1]
  next
}
{
  print $1,$2 " ----- "
}
END{
  for(i in a){
    if(!(i in b)){
      print i" ----- "a[i]
    }
  }
}
'  Input_file2  Input_file1

Output will be as follows.

207.46.13.90  37556 62343
157.55.39.51  34268 58451
40.77.167.109 21824 21824
157.55.39.253 19683 -----
157.55.39.200 ----- 37675
like image 67
RavinderSingh13 Avatar answered Sep 22 '25 03:09

RavinderSingh13


The following is just enough if you don't care about sorting order of the output:

join -a1 -a2 -e----- -oauto <(sort file1.log) <(sort file2.log) |
column -t -s' ' -o'   '

with recreation of the input files:

cat <<EOF >file1.log
207.46.13.90  37556
157.55.39.51  34268
40.77.167.109 21824
157.55.39.253 19683
EOF
cat <<EOF >file2.log
207.46.13.90  62343
157.55.39.51  58451
157.55.39.200 37675
40.77.167.109 21824
EOF

outputs:

157.55.39.200   -----   37675
157.55.39.253   19683   -----
157.55.39.51    34268   58451
207.46.13.90    37556   62343
40.77.167.109   21824   21824

join by default joins by the first columns. The -a1 -a2 make it print the unmatched lines from both inputs. The -e----- prints unknown columns as dots. The -oauto determinates the output from the columns of the inputs. Because we want to sort on the first column, we don't need to specif -k1 to sort, but sort -s -k1 could speed things up. To match the expected output, I also piped to column.

You can sort the output by ports by pipeing it to for example to sort -rnk2,3.

like image 39
KamilCuk Avatar answered Sep 22 '25 02:09

KamilCuk