Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop mapreduce programming

Tags:

hadoop

How do I get the sorted o/p using Hadoop mapreduce programming.

Is there any way to get final key-value pair in sorted order. ( either by key or value).

Any pointers on this greatly appreciated.

Thank You R

like image 740
MRK Avatar asked Nov 23 '25 04:11

MRK


2 Answers

By default, MapReduce will sort input records by their keys.

However, it might help you more to download latest Hadoop release and check out examples they have. There are different sort examples as well.

If you need more information on sort order, this is how it can be changed.

The sort order for keys is controlled by a RawComparator, which is found as follows:

  1. If the property mapred.output.key.comparator.class is set, an instance of that class is used. (The setOutputKeyComparatorClass() method on JobConf is a convenient way to set this property.)

  2. Otherwise, keys must be a subclass of WritableComparable, and the registered comparator for the key class is used.

  3. If there is no registered comparator, then a RawComparator is used that deserializes the byte streams being compared into objects and delegates to the WritableCompar able’s compareTo() method. These rules reinforce why it’s important to register optimized versions of RawCompara tors for your own custom Writable classes, and also that it’s straightforward to override the sort order by setting your own comparator.

like image 149
azec-pdx Avatar answered Nov 24 '25 21:11

azec-pdx


"Hadoop: The Definitive Guide" 2nd edition describes global sort in chapter 8 with code samples.

like image 35
Alexander Verbitsky Avatar answered Nov 24 '25 19:11

Alexander Verbitsky



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!