When running Spark jobs on top of YARN (yarn-cluster mode), YARN creates the workers in containers that have a name that looks something like this: container_e116_1495951495692_11203_01_000105
What is the naming convention for the containers?
Here is my educated guess:
If there is any concrete information about this (or even a refference to the right place in the code), I'd be glad to hear about it.
In light of the above, when running a Spark job on YARN, How can I know which containers belong to which executor?
You can look at https://hadoop.apache.org/docs/current/api/org/apache/hadoop/yarn/api/records/ContainerId.html
A string representation of containerId. The format is container_eepoch_clusterTimestamp_appId_attemptId_containerId when epoch is larger than 0 (e.g. container_e17_1410901177871_0001_01_000005). epoch is increased when RM restarts or fails over. When epoch is 0, epoch is omitted (e.g. container_1410901177871_0001_01_000005).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With