Hadoop Streaming Job with no input file

Question

Is it possible to execute a Hadoop Streaming job that has no input file?

In my use case, I'm able to generate the necessary records for the reducer with a single mapper and execution parameters. Currently, I'm using a stub input file with a single line, I'd like to remove this requirement.

We have 2 use cases in mind.
1)

I want to distribute the loading of files into hdfs from a network location available to all nodes. Basically, I'm going to run ls in the mapper and send the output to a small set of reducers.
We are going to be running fits leveraging several different parameter ranges against several models. The model names do not change and will go to the reducer as keys while the list of tests to run is generated in the mapper.

carpenter · Accepted Answer

According to the docs this is not possible. The following are required parameters for execution:

input directoryname or filename
output directoryname
mapper executable or JavaClassName
reducer executable or JavaClassName

It looks like providing a dummy input file is the way to go currently.

Hadoop Streaming Job with no input file

Tags:

hadoop

hadoop-streaming

Don Albrecht

1 Answers

carpenter

Recent Activity

Donate For Us

Hadoop Streaming Job with no input file

Tags:

hadoop

hadoop-streaming

Don Albrecht

1 Answers

carpenter

Related questions

Recent Activity

Donate For Us