From Jobtracker web UI, I see this column called "Failed/Killed Task Attempts".
I would like to know the distinction between them. I guess "Failed ones" mean tasks that really failed eventually after some retries (so no recovery was done at all?) while "Killed ones" mean tasks which are killed (due to timeout and so on) but they might be retried?
There are a few reasons Hadoop can kill tasks by his own decisions: 
a) Task does not report progress during timeout (default is 10 minutes) 
b) FairScheduler or CapacityScheduler needs the slot for some other pool (FairScheduler) or queue (CapacityScheduler). 
c) Speculative execution causes results of task not to be needed since it has completed on other place. 
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With