Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Submit Presto job on dataproc

I am trying to submit a dataproc job on a cluster running Presto with the postgresql connector.

The cluster is initialized as followed:

gcloud beta dataproc clusters create ${CLUSTER_NAME} \
    --project=${PROJECT} \
    --region=${REGION} \
    --zone=${ZONE} \
    --bucket=${BUCKET_NAME} \
    --num-workers=${WORKERS} \
    --scopes=cloud-platform \
    --initialization-actions=${INIT_ACTION}

${INIT_ACTION} point to a bash file with the initialization actions for starting a presto cluster with postgresql.

I do not use --optional-components=PRESTO since I need --initialization-actions to perform non-default operations. And having both --optional-component and --initialization-actions does not work.

When I try to run a simple job:

gcloud beta dataproc jobs submit presto \
  --cluster ${CLUSTER_NAME} \
  --region ${REGION} \
      -e "SHOW TABLES"

I get the following error:

ERROR: (gcloud.beta.dataproc.jobs.submit.presto) FAILED_PRECONDITION: Cluster 
'<cluster-name>' requires optional component PRESTO to run PRESTO jobs

Is there some other way to define the optional component on the cluster?

UPDATE:

Using both --optional-component and --initialization-actions, as:

gcloud beta dataproc clusters create ${CLUSTER_NAME} \
    ...
    --scopes=cloud-platform \
    --optional-components=PRESTO \
    --image-version=1.3 \
    --initialization-actions=${INIT_ACTION} \
    --metadata ...

The ${INIT_ACTION} is copied from this repo. With a slight modification to the function configure_connectors to create a postgresql connector.

When running the create cluster the following error is given:

ERROR: (gcloud.beta.dataproc.clusters.create) Operation [projects/...] failed: Initialization action failed. Failed action 'gs://.../presto_config.sh', see output in: gs://.../dataproc-initialization-script-0_output.

The error output is logged as:

+ presto '--execute=select * from system.runtime.nodes;'
Error running command: java.net.ConnectException: Failed to connect to localhost/0:0:0:0:0:0:0:1:8080

Which leads me to believe I have to re-write the initialization script.

It would be nice to know which initialization script is running when I specify --optional-components=PRESTO.

like image 794
jean Avatar asked Dec 06 '25 20:12

jean


1 Answers

If all you want to do is setup the optional component to work with a Postgres endpoint writing an optional component to do it is pretty easy. You just have to add the catalog file and restart presto.

https://gist.github.com/KoopaKing/8e653e0c8d095323904946045c5fa4c2

Is an example init action. I have tested it successfully with the presto optional component, but it is pretty simple. Feel free to fork the example and stage it in your GCS bucket.

like image 54
KoopaKing Avatar answered Dec 09 '25 20:12

KoopaKing



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!