Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bash - Check if line in one file exists in another file

Tags:

linux

bash

unix

I wanted to know how I can check if one line from the first column exists as part of another line in another file. For instance if I have the following files:

a.txt:

0000_01_000000049E 7821069312
0000_01_000000049F 7886800896
0000_01_00000004A1 8302987264
0000_01_00000004A2 8469055488
0000_01_00000004A3 8040450048
0000_01_00000004A5 8250165248
0000_01_00000004A6 8116242432
0000_01_00000004A7 8260126720
0000_01_00000004A9 6420892672
0000_01_00000004AA 1076364288
0000_01_00000004AB 7822870528
0000_01_00000004AE 4297589760
0000_01_00000004AF 2360320

b.txt:

0000_01_000000049E,000000,0000_02_00000002AA,7821070336,1451596986,L3,0,0
0000_01_000000049F,000001,0000_02_00000002AA,7886801920,1451623534,L3,0,0
0000_01_00000004A0,000002,0000_02_00000002AA,6888983552,1451051126,L3,0,0
0000_01_00000004A1,000003,0000_02_00000002AA,8302988288,1451618939,L3,0,0
0000_01_00000004A2,000004,0000_02_00000002AA,8469056512,1451605811,L3,0,0
0000_01_00000004A3,000005,0000_02_00000002AA,8040451072,1452180174,L3,0,0
0000_01_00000004A4,000006,0000_02_00000002AA,8569819136,1451541232,L3,0,0
0000_01_00000004A5,000007,0000_02_00000002AA,8250166272,1452181606,L3,0,0
0000_01_00000004A6,000008,0000_02_00000002AA,8116243456,1452013786,L3,0,0
0000_01_00000004A7,000009,0000_02_00000002AA,8260127744,1451420337,L3,0,0
0000_01_00000004A8,000010,0000_02_00000002AA,8454605824,1451542793,L3,0,0
0000_01_00000004A9,000011,0000_02_00000002AA,7543657472,1451568105,L3,0,0
0000_01_00000004AA,000012,0000_02_00000002AA,7654181888,1451494089,L3,0,0
0000_01_00000004AB,000013,0000_02_00000002AA,7822871552,1451590252,L3,0,0
0000_01_00000004AC,000014,0000_02_00000002AA,5295639552,1450925203,L3,0,0
0000_01_00000004AD,000015,0000_02_00000002AA,7793807360,1451470796,L3,0,0
0000_01_00000004AE,000016,0000_02_00000002AA,8330842112,1451591997,L3,0,0
0000_01_00000004AF,000017,0000_02_00000002AA,29039368192,1452093213,L3,0,0

I would like to return the values of the second column in file "b.txt" for which the values of the first column in files "a.txt" and "b.txt" match (Sort of an inner join). So if the file would be in an output file "c.txt", I would like the following output:

c.txt:

000000
000001
000002
000003
000004
000005
000007
000008
000009
000011
000012
000013
000016
000017

Notice that these values don't exist from the second column in file "b.txt":

000006
000010
000014
000015

I have tried looking all over the place but couldn't quite find anything concrete which could help with a solution. Appreciate the help.

Thanks!

like image 200
dabadie Avatar asked Oct 24 '25 09:10

dabadie


2 Answers

I would suggest to use awk as others suggested. However, the task can also being solved having GNU coreutils only:

join -1 1 -2 1 <(tr ',' ' ' < b.txt | sort) <(sort a.txt) | cut -d' ' -f2

which can be significantly shortened to:

join -o 2.2 a.txt <(tr ',' ' ' < b.txt)

Thanks Benjamin W.! Nice one!

like image 73
hek2mgl Avatar answered Oct 26 '25 00:10

hek2mgl


You can use awk:

awk -F '[, ]' 'FNR==NR{col1[$1]; next} $1 in col1{print $2}' a.txt b.txt
000000
000001
000003
000004
000005
000007
000008
000009
000011
000012
000013
000016
000017
like image 38
anubhava Avatar answered Oct 25 '25 23:10

anubhava



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!