Hadoop DistributedCache classpath

Question

I have a Hadoop 0.20.2 cluster.

I'm thinking of using DistributedCache to distribute the job code to all the nodes. I can't understand the difference between addFileToClassPath() and addArchiveToClassPath(). Logically, it would seem that the former is for single class files, and the latter is for jars. But right in the javadocs, they have this example code:

DistributedCache.addFileToClassPath(new Path("/myapp/mylib.jar"), job);

syrkull · Accepted Answer

This question could be helpful

as one of the users mentioned in the comment section there is a bug associated with addArchiveToClassPath() . the best way to learn and solve the problem is by updating your Hadoop to 1.0.0

From the apache website:

addArchiveToClassPath Add an archive path to the current set of classpath entries. It adds the archive to cache as well. Archive files will be unpacked and added to the classpath when being distributed.

addFileToClassPath Add an file path to the current set of classpath entries It adds the file to cache as well. Files added with this method will not be unpacked while being added to the classpath. To add archives to classpath, use the addArchiveToClassPath(Path) method instead.

I have realized that the Hadoop document is written by somebody who does not know english grammer very well. I see why you are frustrated.

Hadoop DistributedCache classpath

Tags:

java

hadoop

Mike Baranczak

1 Answers

syrkull

Recent Activity

Donate For Us

Hadoop DistributedCache classpath

Tags:

java

hadoop

Mike Baranczak

1 Answers

syrkull

Related questions

Recent Activity

Donate For Us