Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop: Can you silently discard a failed map task?

I am processing large amounts of data using hadoop MapReduce. The problem is that, ocassionaly, a corrupt file causes Map task to throw a java heap space error or something similar.

It would be nice, if possible, to just discard whatever that map task was doing, kill it, and move on with the job, never mind the lost data. I don't want the whole M/R job to fail because of that.

Is this possible in hadoop and how?

like image 995
miljanm Avatar asked Nov 28 '25 10:11

miljanm


1 Answers

You can modify the mapreduce.max.map.failures.percent parameter. The default value is 0. Increasing this parameter will allow a certain percentage of map tasks to fail without failing the job.

You can set this parameter in mapred-site.xml (will apply to all jobs), or on a job-by-job basis (probably safer).

like image 75
Donald Miner Avatar answered Nov 30 '25 23:11

Donald Miner



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!