I am trying to run a hadoop-streaming python job.
bin/hadoop jar contrib/streaming/hadoop-0.20.1-streaming.jar 
-D stream.non.zero.exit.is.failure=true 
-input /ixml 
-output /oxml 
-mapper scripts/mapper.py 
-file scripts/mapper.py 
-inputreader "StreamXmlRecordReader,begin=channel,end=/channel" 
-jobconf mapred.reduce.tasks=0 
I made sure mapper.py has all the permissions. It errors out saying
Caused by: java.io.IOException: Cannot run program "mapper.py":     
error=2, No such file or directory
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
    at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214)
... 19 more
Caused by: java.io.IOException: error=2, No such file or directory
    at java.lang.UNIXProcess.forkAndExec(Native Method)
    at java.lang.UNIXProcess.(UNIXProcess.java:53)
    at java.lang.ProcessImpl.start(ProcessImpl.java:91)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
I tried copying mapper.py to hdfs and give the same hdfs://localhost/mapper.py link, that does not work too! Any thoughts on how to fix this bug?.
Looking at the example on the HadoopStreaming wiki page, it seems that you should change
-mapper scripts/mapper.py 
-file scripts/mapper.py 
to
-mapper mapper.py 
-file scripts/mapper.py 
since "shipped files go to the working directory". You might also need to specify the python interpreter directly:
-mapper /path/to/python mapper.py 
-file scripts/mapper.py 
Your problem most likely is that python executable does not exist on the slaves (where TaskTracker is running). Java will give the same error message.
Install it everywhere where it's used. Un your file you can use shebang as you probably already do:
#!/usr/bin/python -O
rest
of
the
code
Make sure that the path after the shebang is the same where python is installed on the TaskTrackers.
One other sneaky thing can cause this. If your line-endings on the script are DOS-style, then your first line (the "shebang line") may look like this to the naked eye:
#!/usr/bin/python
...my code here...
but its bytes look like this to the kernel when it tries to execute your script:
% od -a myScript.py
0000000   #   !   /   u   s   r   /   b   i   n   /   p   y   t   h   o
0000020   n  cr  nl  cr  nl   .   .   .   m   y  sp   c   o   d   e  sp
0000040   h   e   r   e   .   .   .  cr  nl
It's looking for an executable called "/usr/bin/python\r", which it can't find, so it dies with "No such file or directory".
This bit me today, again, so I had to write it down somewhere on SO.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With