I'm trying to set the number of map tasks to run in hadoop 0.20 environment.
I am using the old api.
Here are the options I've tried so far:
    conf.set("mapred.tasktracker.map.tasks.maximum", "5");
    conf.set("mapred.map.tasks", "10");
    conf.set("mapred.map.tasksperslot", "5");
    conf.set("mapred.tasktracker.map", "5");
    conf.set("mapred.map.parallel.copies", "5");
With all of those on, the number of map tasks running parallely remains 2.
What are the proper options to set to get the number of parallely running mappers up to 5?
In the TaskTracker.java
maxCurrentMapTasks = conf.getInt("mapred.tasktracker.map.tasks.maximum", 2);
According to the "Hadoop : The Definitive Guide". So, setting the property on the client side is of no use. You need to set the same in the configuration file.
Be aware that some properties have no effect when set in the client configuration. For example, if in your job submission you set mapred.tasktracker.map.tasks.maximum with the expectation that it would change the number of task slots for the tasktrackers running your job, then you would be disappointed, since this property only is only honored if set in the tasktracker’s mapred-site.html file. In general, you can tell the component where a property should be set by its name, so the fact that mapred.task.tracker.map.tasks.maximum starts with mapred.tasktracker gives you a clue that it can be set only for the tasktracker daemon. This is not a hard and fast rule, however, so in some cases you may need to resort to trial and error, or even reading the source.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With