Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop Streaming Job with no input file

Is it possible to execute a Hadoop Streaming job that has no input file?

In my use case, I'm able to generate the necessary records for the reducer with a single mapper and execution parameters. Currently, I'm using a stub input file with a single line, I'd like to remove this requirement.

We have 2 use cases in mind.
1)

  1. I want to distribute the loading of files into hdfs from a network location available to all nodes. Basically, I'm going to run ls in the mapper and send the output to a small set of reducers.
  2. We are going to be running fits leveraging several different parameter ranges against several models. The model names do not change and will go to the reducer as keys while the list of tests to run is generated in the mapper.
like image 722
Don Albrecht Avatar asked Jan 25 '26 05:01

Don Albrecht


1 Answers

According to the docs this is not possible. The following are required parameters for execution:

  • input directoryname or filename
  • output directoryname
  • mapper executable or JavaClassName
  • reducer executable or JavaClassName

It looks like providing a dummy input file is the way to go currently.

like image 61
carpenter Avatar answered Jan 26 '26 22:01

carpenter



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!