Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

getting Check-sum mismatch during data transfer between two different version of hadoop

Tags:

hadoop

I am new with hadoop.I am transfering data between hadoop 0.20 and hadoop 2.2.0 using distcp command. during transfer i am getting below error:

Check-sum mismatch between hftp://10.0.3.28:50070/hive/warehouse/staging_precall_cdr/operator=idea/PRECALL_CDR_Assam_OCT_JAN.csv and hdfs://10.0.20.118:9000/user/hive/warehouse/PRECALL_CDR_Assam_OCT_JAN.csv

I have used -skipcrccheck and -Ddfs.checksum.type=CRC32 also but did not get any solution. Solutions will be appreciated.

like image 788
rahul sorot Avatar asked Oct 12 '25 05:10

rahul sorot


1 Answers

It looks like a known issue in Jira , copying data between 0.20 and 2.2.0 hadoop version https://issues.apache.org/jira/browse/HDFS-3054.

A workaround to this problem is to enable preserve block and check-sum in the distcp copying using -pbc.

hadoop distcp -pbc <SRC> <DEST>

OR

Use Skip CRC check using -skipcrccheck option

hadoop distcp -skipcrccheck -update <SRC> <DEST>

like image 73
SachinJ Avatar answered Oct 16 '25 06:10

SachinJ



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!