Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to compare 2 files having random numbers in non sequential order?

Tags:

python

awk

There are 2 files named compare 1.txt and compare2.txt having random numbers in non-sequential order

cat compare1.txt

57
11
13
3
889
014
91

cat compare2.txt

003
889
13
14
57
12
90

Aim

  1. Output list of all the numbers which are present in compare1 but not in compare 2 and vice versa

  2. If any number has zero in its prefix, ignore zeros while comparing ( basically the absolute value of number must be different to be treated as a mismatch ) Example - 3 should be considered matching with 003 and 014 should be considered matching with 14, 008 with 8 etc

Note - It is not necessary that matching must necessarily happen on the same line. A number present in the first line in compare1 should be considered matched even if that same number is present on other than the first line in compare2

Expected output

90
91
12
11

PS ( I don't necessarily need this exact order in expected output, just these 4 numbers in any order would do )

What I tried?

Obviously I didn't have hopes of getting the second condition correct, I tried only fulfilling the first condition but couldn't get correct results. I had tried these commands

grep -Fxv -f compare1.txt compare2.txt && grep -Fxv -f compare2.txt compare1.txt
cat compare1.txt compare2.txt | sort |uniq

Edit - A Python solution is also fine

like image 454
Sachin Avatar asked Jun 21 '20 06:06

Sachin


Video Answer


2 Answers

Could you please try following, written and tested with shown samples in GNU awk.

awk '
{
  $0=$0+0
}
FNR==NR{
  a[$0]
  next
}
($0 in a){
  b[$0]
  next
}
{ print }
END{
  for(j in a){
    if(!(j in b)){ print j }
  }
}
'  compare1.txt compare2.txt

Explanation: Adding detailed explanation for above.

awk '                                ##Starting awk program from here.
{
  $0=$0+0                            ##Adding 0 will remove extra zeros from current line,considering that your file doesn't have float values.
}
FNR==NR{                             ##Checking condition FNR==NR which will be TRUE when 1st Input_file is being read.
  a[$0]                              ##Creating array a with index of current line here.
  next                               ##next will skip all further statements from here.
}
($0 in a){                           ##Checking condition if current line is present in a then do following.
  b[$0]                              ##Creating array b with index of current line.
  next                               ##next will skip all further statements from here.
}
{ print }                                   ##will print current line from 2nd Input_file here.
END{                                 ##Starting END block of this code from here.
  for(j in a){                       ##Traversing through array a here.
    if(!(j in b)){ print j }         ##Checking condition if current index value is NOT present in b then print that index.
  }
}
'  compare1.txt compare2.txt         ##Mentioning Input_file names here.
like image 178
RavinderSingh13 Avatar answered Oct 06 '22 19:10

RavinderSingh13


Here's how to do what you want just using awk:

$ awk '{$0+=0} NR==FNR{a[$0];next} !($0 in a)' compare1.txt compare2.txt
12
90

$ awk '{$0+=0} NR==FNR{a[$0];next} !($0 in a)' compare2.txt compare1.txt
11
91

but this is the job that comm exists to do so here's how you could use that to get all differences and common lines at once. In the following output col1 is compare1.txt only, col2 is compare2.txt only, col3 is common between both files:

$ comm <(awk '{print $0+0}' compare1.txt | sort) <(awk '{print $0+0}' compare2.txt | sort)
11
    12
        13
        14
        3
        57
        889
    90
91

or to get each result individually:

$ comm -23 <(awk '{print $0+0}' compare1.txt | sort) <(awk '{print $0+0}' compare2.txt | sort)
11
91

$ comm -13 <(awk '{print $0+0}' compare1.txt | sort) <(awk '{print $0+0}' compare2.txt | sort)
12
90

$ comm -12 <(awk '{print $0+0}' compare1.txt | sort) <(awk '{print $0+0}' compare2.txt | sort)
13
14
3
57
889
like image 6
Ed Morton Avatar answered Oct 06 '22 20:10

Ed Morton