I am running a Map Reduce Program. However I am getting similar output even though I am running it with only mapper or both with mapper and reducer.
After this it never completes.It hangs up there on.
I am not getting why reducer is getting started before mapper has finished 100%? What might be potential problems?
Output:
Map 10% Reduce 0%
Map 19% Reduce 0%
Map 21% Reduce 0%
Map 39% Reduce 0%
Map 49% Reduce 0%
Map 63% Reduce 0% 
Map 67% Reduce 0% 
Map 68% Reduce 0% 
Map 68% Reduce 22%
Map 69% Reduce 22%
Here is a mapper code:
public class EntityCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
  static String total_record="";
  @Override
  protected void map(LongWritable baseAddress, Text line, Context context)
        throws IOException, InterruptedException {
    Text entity=new Text();
    IntWritable one=new IntWritable(1);
    total_record=total_record.concat(line.toString());
    String[] fields=total_record.split("::");
    if(fields.length==24)
    {
        entity.set(fields[22].trim());          
        context.write(entity,one);
        total_record="";
    }       
  }
}
This can happen while mappers are generating data since it is only a data transfer. On the other hand, sort and reduce can only start once all the mappers are done.
Reduce can be started as soon as there is enough data for it to start - e.g. two nodes completed their map job.
Because they "hog up" reduce slots while only copying data and waiting for mappers to finish. Another job that starts later that will actually use the reduce slots now can't use them. You can customize when the reducers startup by changing the default value of mapred. reduce.
Reduce: A reducer cannot start while a mapper is still in progress. Worker nodes process each group of <key,value> pairs output data, in parallel to produce <key,value> pairs as output. All the map output values that have the same key are assigned to a single reducer, which then aggregates the values for that key.
The reduce phase has 3 steps: shuffle, sort, reduce. Shuffle is where the data is collected by the reducer from each mapper. This can happen while mappers are generating data since it is only a data transfer. On the other hand, sort and reduce can only start once all the mappers are done. You can tell which one MapReduce is doing by looking at the reducer completion percentage: 0-33% means its doing shuffle, 34-66% is sort, 67%-100% is reduce. This is why your reducers will sometimes seem "stuck" at 33%-- it's waiting for mappers to finish.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With