I'd like to track some related applications in YARN. They're submitted via command line, e.g.
yarn jar hadoop-mapreduce-examples.jar pi 10 100
Python has a really easy-to-use YARN client that returns the following:
finalStatus = SUCCEEDED
id = application_1458083392566_0929
state = FINISHED
name = QuasiMonteCarlo
applicationType = MAPREDUCE
user = awoolford
applicationTags =
[...etc...]
I notice there's an applicationTags property. This would be an ideal way to track groups of related applications. I tried setting it via HADOOP_CLIENT_OPTS, e.g.
HADOOP_CLIENT_OPTS="-DapplicationTags=batch123,chunk62" hadoop jar [...etc...]
... but the applicationTags string didn't show up in YARN when I tried to retrieve them via the Python client.
Q) How can I submit a YARN job and populate the applicationTags property from the command line?
The property that needs to be set is called mapreduce.job.tags (see Jira). So, for the calculate Pi MapReduce example, you'd tag the job like this:
yarn jar hadoop-mapreduce-examples.jar pi -Dmapreduce.job.tags=myJobTag 10 100
Credit to Neerja Khattar from Cloudera for figuring out how to do this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With