Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there any way to get informative errors from python scripts in hadoop streaming?

I'm using python with hadoop streaming. Despite careful unit testing, errors inevitably creep in. When they do, this error message is all that hadoop gives:

java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:576)
...

The message is very unhelpful for debugging.

Is there any way to get informative errors from python scripts in hadoop streaming?

like image 483
Abe Avatar asked Dec 29 '25 06:12

Abe


1 Answers

if you have access to the jobtracker for the cluster where you are running you can get to the stderr/stdout of the script by finding the job and looking for the tasks that failed.

like image 177
Paul M Avatar answered Jan 01 '26 00:01

Paul M