Hadoop: Can you silently discard a failed map task?

Question

I am processing large amounts of data using hadoop MapReduce. The problem is that, ocassionaly, a corrupt file causes Map task to throw a java heap space error or something similar.

It would be nice, if possible, to just discard whatever that map task was doing, kill it, and move on with the job, never mind the lost data. I don't want the whole M/R job to fail because of that.

Is this possible in hadoop and how?

Donald Miner · Accepted Answer

You can modify the mapreduce.max.map.failures.percent parameter. The default value is 0. Increasing this parameter will allow a certain percentage of map tasks to fail without failing the job.

You can set this parameter in mapred-site.xml (will apply to all jobs), or on a job-by-job basis (probably safer).

Hadoop: Can you silently discard a failed map task?

Tags:

java

hadoop

mapreduce

miljanm

1 Answers

Donald Miner

Recent Activity

Donate For Us

Hadoop: Can you silently discard a failed map task?

Tags:

java

hadoop

mapreduce

miljanm

1 Answers

Donald Miner

Related questions

Recent Activity

Donate For Us