I have a Hadoop 0.20.2 cluster.
I'm thinking of using DistributedCache to distribute the job code to all the nodes. I can't understand the difference between addFileToClassPath() and addArchiveToClassPath(). Logically, it would seem that the former is for single class files, and the latter is for jars. But right in the javadocs, they have this example code:
DistributedCache.addFileToClassPath(new Path("/myapp/mylib.jar"), job);
This question could be helpful
as one of the users mentioned in the comment section there is a bug associated with addArchiveToClassPath() . the best way to learn and solve the problem is by updating your Hadoop to 1.0.0
From the apache website:
addArchiveToClassPath Add an archive path to the current set of classpath entries. It adds the archive to cache as well. Archive files will be unpacked and added to the classpath when being distributed.
addFileToClassPath Add an file path to the current set of classpath entries It adds the file to cache as well. Files added with this method will not be unpacked while being added to the classpath. To add archives to classpath, use the addArchiveToClassPath(Path) method instead.
I have realized that the Hadoop document is written by somebody who does not know english grammer very well. I see why you are frustrated.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With