I'm practicing a video tutorial from plural sight about Amazon EMR. I am stuck as i cannot proceed as i am getting this error
Not a valid JAR: /home/hadoop/contrib/streaming/hadoop-streaming.jar 
Please note that tutorial is old and it is using a older Emr version. I am using the latest version is that a problem ?
The steps that i took are after entering the credentials in putty
1) Hadoop
2) mkdir streamingCode`
3) wget -o ./streamingCode/wordSplitter.py s3://elasticmapreduce/samples/wordcount/wordSplitter.py
4) hadoop jar contrib/streaming/hadoop-streaming.jar -files streamingCode/wordSplitter.py -mapper wordSplitter.py input s3://elasticmapreduce/samples/wordcount/input -output streamingCode/wordCountOut -reducer aggregate`
I cannot execute step 4 as i am getting the below error
Not a valid JAR: /home/hadoop/contrib/streaming/hadoop-streaming.jar
you can find streaming jar in /usr/hdp/current/hadoop-mapreduce-client, make sure mapreduce, hdfs and yarn clients are installed on your machine. you can find streaming jar in /usr/hdp/current/hadoop-mapreduce-client, make sure mapreduce, hdfs and yarn clients are installed on your machine.
For this you need to add a package name to your . java file according to the directory structure , for example home. hduser. dir and while running the hadoop jar command specify the class name with the package structure, for example home.
Which is the tool of Hadoop streaming data transfer? Apache Flume – Data Transfer In Hadoop.
Let us now see how Hadoop Streaming works. The mapper and the reducer (in the above example) are the scripts that read the input line-by-line from stdin and emit the output to stdout. The utility creates a Map/Reduce job and submits the job to an appropriate cluster and monitor the job progress until its completion.
The Hadoop streaming jar is still available in the latest release of EMR Hadoop.  Starting with EMR release 4.0.0 it can be found at /usr/lib/hadoop-mapreduce/hadoop-streaming.jar.
Another good resource for differences between versions can be found at http://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/emr-release-differences.html.
For the variable, HADOOP_STREAMING, obtaining the path is a bit more complicated depending on the HDP you are using.
Search for where it is located via command: find / -name 'hadoop-streaming*.jar'
Src: http://thecoatlessprofessor.com/programming/installing-r-studio-server-on-hortonworks-virtual-box-image-and-rmr2-a-k-a-rhadoop-r-package/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With