I am using Hadoop 0.20.2, and am using the old API. I'm trying to send chunks of data to mappers as opposed to sending one line at a time (the data covers multiple lines). I've attempted to us the NLineInputFormat to set how many lines to get at once, but the mapper is still receiving only 1 line at a time. I'm pretty sure that I have the right code. Are there any reasons why this would fail to work?
For your reference,
JobConf conf = new JobConf(WordCount.class);
conf.setInt("mapred.line.input.format.linespermap", 2);
conf.setInputFormat(NLineInputFormat.class);
Basically, I'm using the sample code from http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Example%3A+WordCount+v1.0, only changing the TextInputFormat.
Thanks in advance
NLineInputFormat is designed to ensure that mappers all receive the same number of input records (except the final part of the split for each file).
So by changing the input property to 2, each mapper should (at maximum) receive 2 input pairs, not 2 input lines at a time (which is what i think you are looking for).
You should be able to confirm this by looking at the counters for each map task, "Map input records" which should be reporting 2 for most of your mappers
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With