Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the naming convention for YARN containers used by Spark?

When running Spark jobs on top of YARN (yarn-cluster mode), YARN creates the workers in containers that have a name that looks something like this: container_e116_1495951495692_11203_01_000105

What is the naming convention for the containers?

Here is my educated guess:

  • container - Just a constant string, obviously
  • e116 - No Idea what this is. Maybe something to do with the YARN version.
  • 1495951495692_11203 - The application-id
  • 01 - An attempt counter?
  • 000105 - This is probably just an increment integer.

If there is any concrete information about this (or even a refference to the right place in the code), I'd be glad to hear about it.

In light of the above, when running a Spark job on YARN, How can I know which containers belong to which executor?

like image 904
summerbulb Avatar asked Jan 19 '26 17:01

summerbulb


1 Answers

You can look at https://hadoop.apache.org/docs/current/api/org/apache/hadoop/yarn/api/records/ContainerId.html

A string representation of containerId. The format is container_eepoch_clusterTimestamp_appId_attemptId_containerId when epoch is larger than 0 (e.g. container_e17_1410901177871_0001_01_000005). epoch is increased when RM restarts or fails over. When epoch is 0, epoch is omitted (e.g. container_1410901177871_0001_01_000005).

like image 98
Nadav Gruner Avatar answered Jan 21 '26 06:01

Nadav Gruner