Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Programmatically cancelling a pyspark dataproc batch job

Using golang, I have several dataproc batch jobs running and I can access them via their Uuid by creating a client like this.

BatchClient, err := dataproc.NewBatchControllerClient(context, ...options)

If I wanted to delete a batch job, I could do it using google cloud's golang client library like this (the request body contains the Uuid of the batch)

_, err := batchClient.DeleteBatch(context, request, ...options)

However, there doesn't seem to be any way to cancel a batch that's already running programmatically. If I try to delete a batch that is already running, I rightfully get an error of FAILED_PRECONDITION

Now, I'm aware that Google cloud's SDK cli has a simple way to cancel a job like this:

gcloud dataproc batches cancel (BATCH : --region=REGION) [GCLOUD_WIDE_FLAG …]

Unfortunately, this approach is not a good fit for my application.

like image 438
David Gamboa Avatar asked Jan 23 '26 12:01

David Gamboa


1 Answers

Before deleting a batch resource you need to make sure that it's in the terminal state (either failed or succeeded).

To achieve this for running batch, you need to cancel it via associated long-running operation: https://cloud.google.com/dataproc-serverless/docs/reference/rest/v1/projects.locations.operations/cancel

like image 59
Igor Dvorzhak Avatar answered Jan 26 '26 07:01

Igor Dvorzhak



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!