Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

where are the individual dataproc spark logs?

Where are the dataproc spark job logs located? I know there are logs from the driver under "Logging" section but what about the execution nodes? Also, where are the detailed steps that Spark is executing logged (I know I can see them in the Application Master)? I am attempting to debug a script that seems to hang and spark seems to freeze.

like image 331
Alex Avatar asked Nov 16 '25 05:11

Alex


1 Answers

UPDATE in Q3 2022: This answer is outdated, see Dataproc YARN container logs location for the latest info.

The task logs are stored on each worker node under /tmp.

It is possible to collect them in one place via yarn log aggregation. Set these properties at cluster creation time (via --properties with yarn: prefix):

  • yarn.log-aggregation-enable=true
  • yarn.nodemanager.remote-app-log-dir=gs://${LOG_BUCKET}/logs
  • yarn.log-aggregation.retain-seconds=-1

Here's an article that discusses log management:

https://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/

like image 181
tix Avatar answered Nov 17 '25 18:11

tix



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!