I'm using python with hadoop streaming. Despite careful unit testing, errors inevitably creep in. When they do, this error message is all that hadoop gives:
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)
...
The message is very unhelpful for debugging.
Is there any way to get informative errors from python scripts in hadoop streaming?
if you have access to the jobtracker for the cluster where you are running you can get to the stderr/stdout of the script by finding the job and looking for the tasks that failed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With