Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NLineInputFormat has no effect

I am using Hadoop 0.20.2, and am using the old API. I'm trying to send chunks of data to mappers as opposed to sending one line at a time (the data covers multiple lines). I've attempted to us the NLineInputFormat to set how many lines to get at once, but the mapper is still receiving only 1 line at a time. I'm pretty sure that I have the right code. Are there any reasons why this would fail to work?

For your reference,

JobConf conf = new JobConf(WordCount.class);

conf.setInt("mapred.line.input.format.linespermap", 2);

conf.setInputFormat(NLineInputFormat.class);

Basically, I'm using the sample code from http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Example%3A+WordCount+v1.0, only changing the TextInputFormat.

Thanks in advance

like image 295
cliffycheng Avatar asked Mar 08 '26 01:03

cliffycheng


1 Answers

NLineInputFormat is designed to ensure that mappers all receive the same number of input records (except the final part of the split for each file).

So by changing the input property to 2, each mapper should (at maximum) receive 2 input pairs, not 2 input lines at a time (which is what i think you are looking for).

You should be able to confirm this by looking at the counters for each map task, "Map input records" which should be reporting 2 for most of your mappers

like image 102
Chris White Avatar answered Mar 10 '26 16:03

Chris White