Bash: join by numeric column

Question

If I want to use join on my Ubuntu, I need to first sort both files lexicographically (according to join --help), and only then join them:

tail -n +2 meta/201508_1 | sort -k 1b,1 > meta.txt
tail -n +2 keywords/copy | sort -k 1b,1 > keywords.txt
join meta.txt keywords.txt -1 1 -2 1 -t $'	'

(I also remove the header from both of them using tail)

But instead of sorting files lexicographically, I would like to sort them numerically: the first column in both files is an ID.

tail -n +2 meta/201508_1 | sort -k1 -n > meta.txt
tail -n +2 keywords/copy.txt | sort -k1 -n > keywords.txt

And then join. But for join these files look unsorted:

join: meta.txt:10: is not sorted: 1023  301000  en
join: keywords.txt:2: is not sorted: 10 keyword1

If I add --nocheck-order to join, it doesn't join properly - it outputs just one line.

How do I join two files on their numerical ID in bash?

Sample (columns are tab-separated):

file 1

id  volume lang
1   10  en
2   20  en
5   30  en
6   40  en
10  50  en

file 2

id  keyword
4   kw1
2   kw2
10  kw3
1   kw4
3   kw5

desired output

1   kw4 10  en
2   kw2 20  en
10  kw3 50  en

Johannes Weiss · Accepted Answer

Both of these work. The first one (sort -b is recommended on the Mac)

join <(sed 1d file1 | sort -b) <(sed 1d file2 | sort -b) | sort -n

the Linux man page recommends sort -k 1b,1

join <(sed 1d file1 | sort -k 1b,1) <(sed 1d file2 | sort -k 1b,1) | sort -n

In any case, you need to sort them lexicographically to join them. At the end you can still sort the result numerically.

Bash: join by numeric column

Tags:

linux

bash

shell

join

unix

Alexey Grigorev

1 Answers

Johannes Weiss

Recent Activity

Donate For Us

Bash: join by numeric column

Tags:

linux

bash

shell

join

unix

Alexey Grigorev

1 Answers

Johannes Weiss

Related questions

Recent Activity

Donate For Us