My Hadoop MapReduce custom InputFormat for splitting the input performs some additional work that I want to know about when the job is finished. Essentially I need to know some metrics for the number of certain operations my InputFormat implementation performed.
What's the best way to pass additional information out of InputFormat back to the MapReduce job? If InputFormat were passed a Job instance, I could just update counters; unfortunately the JobContext Hadoop passes (I'm using v2.10.x) doesn't provide access to counters.
Should I store the information in the configuration, which I can access via JobContext, and allow the job to access it later? That seems like a bit of a kludge.
I think the right path is to use the timeline service.
It's made for storing application specific data.
// Create and start the Timeline client TimelineClient client = TimelineClient.createTimelineClient(); client.init(conf); client.start(); TimelineEntity entity = null; // Compose the entity try { TimelinePutResponse response = client.putEntities(entity); } catch (IOException e) { // Handle the exception } catch (YarnException e) { // Handle the exception } // Stop the Timeline client client.stop();
To pull information back out you can use the rest API.
Timeline Entity A Timeline Entity contains the the meta information of a conceptual entity and its related events.
The entity can be an application, an application attempt, a container or any user-defined object.
It contains Primary filters which will be used to index the entities in the Timeline Store. Accordingly, users/applications should carefully choose the information they want to store as the primary filters.
The remaining data can be stored as unindexed information. Each Entity is uniquely identified by an EntityId and EntityType.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With