Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Amazon EMR: Passing an XML or properties file to a JAR

I've been running several map reduce jobs on a hadoop cluster from a single JAR file. The Main of the JAR accepts an XML file as a command line parameter. The XML file contains the input and output paths for each job (name-value property pairs) and I use these to configure each mapreduce job. I'm able to load the paths into the Configuration like so

    Configuration config = new Configuration(false);
    config.addResource(new FileInputStream(args[0]));

I am now trying to run the JAR using Amazon's Elastic MapReduce. I tried uploading the XML file to S3 but of course using FileInputStream to load the paths data from S3 doesn't work (FileNotFound Exception).

How can I pass the XML file to the JAR when using EMR?

(I looked at bootstrap actions but as far as I can tell that's to specify hadoop-specific configurations).

Any insight would be appreciated. Thanks.

like image 862
Girish Rao Avatar asked Dec 31 '25 18:12

Girish Rao


1 Answers

If you add a simple bootstrap action that does

hadoop fs -copyToLocal s3n://bucket/key.xml /target/path/on/local/filesystem.xml

you will then be able to open a FileInputStream on /target/path/on/local/filesystem.xml as you had intended. The bootstrap action is executed simultaneously on all the master/slave machines in the cluster, so they will all have a local copy.

To add that bootstrap action you'll need to create a shell script file that contains the above command, upload that to S3, and specify it as the script bootstrap action path. Unfortunately a shell script in s3 is currently the only allowable type of bootstrap action.

like image 65
Judge Mental Avatar answered Jan 02 '26 06:01

Judge Mental



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!