Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Too many fetch failures: Hadoop on cluster (x2)

Tags:

hadoop

I have been using Hadoop for the last week or so (trying to get to grips with it), and although I have been able to set up a multinode cluster (2 machines: 1 laptop and a small desktop) and retrieve results, I always seem to encounter "Too many fetch failures" when I run a hadoop job.

An example output (on a trivial wordcount example) is:

hadoop@ap200:/usr/local/hadoop$ bin/hadoop jar hadoop-examples-0.20.203.0.jar wordcount sita sita-output3X
11/05/20 15:02:05 INFO input.FileInputFormat: Total input paths to process : 7
11/05/20 15:02:05 INFO mapred.JobClient: Running job: job_201105201500_0001
11/05/20 15:02:06 INFO mapred.JobClient:  map 0% reduce 0%
11/05/20 15:02:23 INFO mapred.JobClient:  map 28% reduce 0%
11/05/20 15:02:26 INFO mapred.JobClient:  map 42% reduce 0%
11/05/20 15:02:29 INFO mapred.JobClient:  map 57% reduce 0%
11/05/20 15:02:32 INFO mapred.JobClient:  map 100% reduce 0%
11/05/20 15:02:41 INFO mapred.JobClient:  map 100% reduce 9%
11/05/20 15:02:49 INFO mapred.JobClient: Task Id :      attempt_201105201500_0001_m_000003_0, Status : FAILED
Too many fetch-failures
11/05/20 15:02:53 INFO mapred.JobClient:  map 85% reduce 9%
11/05/20 15:02:57 INFO mapred.JobClient:  map 100% reduce 9%
11/05/20 15:03:10 INFO mapred.JobClient: Task Id : attempt_201105201500_0001_m_000002_0, Status : FAILED
Too many fetch-failures
11/05/20 15:03:14 INFO mapred.JobClient:  map 85% reduce 9%
11/05/20 15:03:17 INFO mapred.JobClient:  map 100% reduce 9%
11/05/20 15:03:25 INFO mapred.JobClient: Task Id : attempt_201105201500_0001_m_000006_0, Status : FAILED
Too many fetch-failures
11/05/20 15:03:29 INFO mapred.JobClient:  map 85% reduce 9%
11/05/20 15:03:32 INFO mapred.JobClient:  map 100% reduce 9%
11/05/20 15:03:35 INFO mapred.JobClient:  map 100% reduce 28%
11/05/20 15:03:41 INFO mapred.JobClient:  map 100% reduce 100%
11/05/20 15:03:46 INFO mapred.JobClient: Job complete: job_201105201500_0001
11/05/20 15:03:46 INFO mapred.JobClient: Counters: 25
11/05/20 15:03:46 INFO mapred.JobClient:   Job Counters 
11/05/20 15:03:46 INFO mapred.JobClient:     Launched reduce tasks=1
11/05/20 15:03:46 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=72909
11/05/20 15:03:46 INFO mapred.JobClient:     Total time spent by all reduces waiting  after reserving slots (ms)=0
11/05/20 15:03:46 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
11/05/20 15:03:46 INFO mapred.JobClient:     Launched map tasks=10
11/05/20 15:03:46 INFO mapred.JobClient:     Data-local map tasks=10
11/05/20 15:03:46 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=76116
11/05/20 15:03:46 INFO mapred.JobClient:   File Output Format Counters 
11/05/20 15:03:46 INFO mapred.JobClient:     Bytes Written=1412473
11/05/20 15:03:46 INFO mapred.JobClient:   FileSystemCounters
11/05/20 15:03:46 INFO mapred.JobClient:     FILE_BYTES_READ=4462381
11/05/20 15:03:46 INFO mapred.JobClient:     HDFS_BYTES_READ=6950740
11/05/20 15:03:46 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=7546513
11/05/20 15:03:46 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1412473
11/05/20 15:03:46 INFO mapred.JobClient:   File Input Format Counters 
11/05/20 15:03:46 INFO mapred.JobClient:     Bytes Read=6949956
11/05/20 15:03:46 INFO mapred.JobClient:   Map-Reduce Framework
11/05/20 15:03:46 INFO mapred.JobClient:     Reduce input groups=128510
11/05/20 15:03:46 INFO mapred.JobClient:     Map output materialized bytes=2914947
11/05/20 15:03:46 INFO mapred.JobClient:     Combine output records=201001
11/05/20 15:03:46 INFO mapred.JobClient:     Map input records=137146
11/05/20 15:03:46 INFO mapred.JobClient:     Reduce shuffle bytes=2914947
11/05/20 15:03:46 INFO mapred.JobClient:     Reduce output records=128510
11/05/20 15:03:46 INFO mapred.JobClient:     Spilled Records=507835
11/05/20 15:03:46 INFO mapred.JobClient:     Map output bytes=11435785
11/05/20 15:03:46 INFO mapred.JobClient:     Combine input records=1174986
11/05/20 15:03:46 INFO mapred.JobClient:     Map output records=1174986
11/05/20 15:03:46 INFO mapred.JobClient:     SPLIT_RAW_BYTES=784
11/05/20 15:03:46 INFO mapred.JobClient:     Reduce input records=201001

I did a google on the problem, and the people at apache seem to suggest it could be anything from a networking problem (or something to do with /etc/hosts files) or could be a corrupt disk on the slave nodes.

Just to add: I do see 2 "live nodes" on namenode Admin panel (localhost:50070/dfshealth) and under Map/reduce Admin, I see 2 nodes aswell.

Any clues as to how I can avoid these errors? Thanks in advance.

Edit:1:

The tasktracker log is on: http://pastebin.com/XMkNBJTh The datanode log is on: http://pastebin.com/ttjR7AYZ

Many thanks.

like image 833
John M Avatar asked Dec 01 '25 14:12

John M


2 Answers

Modify datanode node/etc/hosts file.

Each line is divided into three parts. The first part is the network IP address, the second part is the host name or domain name, the third part is the host alias detailed steps are as follows:

  1. First check the host name:

    cat / proc / sys / kernel / hostname

    You will see a HOSTNAME attribute. Change the value of the IP behind on OK and then exit.

  2. Use the command:

    hostname ***. ***. ***. ***

    Asterisk is replaced by the corresponding IP.

  3. Modify the the hosts configuration similarly, as follows:

    127.0.0.1 localhost.localdomain localhost :: 1 localhost6.localdomain6 localhost6 10.200.187.77 10.200.187.77 hadoop-datanode

If the IP address is configured and successfully modified, or show host name there is a problem, continue to modify the hosts file.

like image 156
Ricky Avatar answered Dec 04 '25 04:12

Ricky


Following solution will definitely work

1.Remove or comment line with Ip 127.0.0.1 and 127.0.1.1

2.use host name not alias for referring node in host file and Master/slave file present in hadoop directory

  -->in Host file 172.21.3.67 master-ubuntu

  -->in master/slave file master-ubuntu

3. see for NameSpaceId of namenode = NameSpaceId of Datanode

like image 33
user2200278 Avatar answered Dec 04 '25 05:12

user2200278



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!