Passing additional values out of Hadoop MapReduce InputFormat

Question

My Hadoop MapReduce custom InputFormat for splitting the input performs some additional work that I want to know about when the job is finished. Essentially I need to know some metrics for the number of certain operations my InputFormat implementation performed.

What's the best way to pass additional information out of InputFormat back to the MapReduce job? If InputFormat were passed a Job instance, I could just update counters; unfortunately the JobContext Hadoop passes (I'm using v2.10.x) doesn't provide access to counters.

Should I store the information in the configuration, which I can access via JobContext, and allow the job to access it later? That seems like a bit of a kludge.

Matt Andruff · Accepted Answer

I think the right path is to use the timeline service.

It's made for storing application specific data.

 // Create and start the Timeline client
  TimelineClient client = TimelineClient.createTimelineClient();
  client.init(conf);
  client.start();

  TimelineEntity entity = null;
  // Compose the entity
  try {
    TimelinePutResponse response = client.putEntities(entity);
  } catch (IOException e) {
    // Handle the exception
  } catch (YarnException e) {
    // Handle the exception
  }

  // Stop the Timeline client
  client.stop();

To pull information back out you can use the rest API.

Timeline Entity A Timeline Entity contains the the meta information of a conceptual entity and its related events.

The entity can be an application, an application attempt, a container or any user-defined object.

It contains Primary filters which will be used to index the entities in the Timeline Store. Accordingly, users/applications should carefully choose the information they want to store as the primary filters.

The remaining data can be stored as unindexed information. Each Entity is uniquely identified by an EntityId and EntityType.

Passing additional values out of Hadoop MapReduce InputFormat

Tags:

hadoop

mapreduce

Garret Wilson

1 Answers

Matt Andruff

Recent Activity

Donate For Us

Passing additional values out of Hadoop MapReduce InputFormat

Tags:

hadoop

mapreduce

Garret Wilson

1 Answers

Matt Andruff

Related questions

Recent Activity

Donate For Us