I am attempting to submit a job for training in ML-Engine using gcloud but am running into an error with service account permissions that I can't figure out. The model code exists on a Compute Engine instance from which I am running gcloud ml-engine jobs submit
as part of a bash script. I have created a service account ([email protected]) for gcloud authentication on the VM instance and have created a bucket for the job and model data. The service account has been granted Storage Object Viewer and Storage Object Creator roles for the bucket and the VM and bucket all belong to the same project.
When I try to submit a job per this tutorial, the following are executed:
time_stamp=`date +"%Y%m%d_%H%M"`
job_name='ObjectDetection_'${time_stamp}
gsutil cp object_detection/samples/configs/faster_rcnn_resnet50.config
gs://[bucket-name]/training_configs/faster-rcnn-resnet50.${job_name}.config
gcloud ml-engine jobs submit training ${job_name} \
--project [project-name] \
--runtime-version 1.12 \
--job-dir=gs://[bucket-name]/jobs/${job_name} \
--packages dist/object_detection-0.1.tar.gz,slim/dist/slim-0.1.tar.gz,/tmp/pycocotools/pycocotools-2.0.tar.gz \
--module-name object_detection.model_main \
--region us-central1 \
--config object_detection/training-config.yml \
-- \
--model_dir=gs://[bucket-name]/output/${job_name}} \
--pipeline_config_path=gs://[bucket-name]/training_configs/faster-rcnn-resnet50.${job_name}.config
where [bucket-name] and [project-name] are placeholders for the bucket created above and the project it and the VM are contained in.
The config file is successfully uploaded to the bucket, I can confirm it exists in the cloud console. However, the job fails to submit with the following error:
ERROR: (gcloud.ml-engine.jobs.submit.training) User [[email protected]] does not have permission to access project [project-name] (or it may not exist): Field: job_dir Error: You don't have the permission to access the provided directory 'gs://[bucket-name]/jobs/ObjectDetection_20190709_2001'
- '@type': type.googleapis.com/google.rpc.BadRequest
fieldViolations:
- description: You don't have the permission to access the provided directory 'gs://[bucket-name]/jobs/ObjectDetection_20190709_2001'
field: job_dir
If I look in the cloud console, the files specified by --packages
exist in that location, and I've ensured the service account [email protected]
has been given Storage Object Viewer and Storage Object Creator roles for the bucket, which has bucket level permissions set. After ensuring the service account is activated and the default, I can also run
gsutil ls gs://[bucket-name]/jobs/ObjectDetection_20190709_2001
which successfully returns the contents of the folder without a permission error. In the project, there exists a managed service account service-[project-number]@cloud-ml.google.com.iam.gserviceaccount.com
and I have also granted this account Storage Object Viewer and Storage Object Creator roles on the bucket.
To confirm this VM is able to submit a job, I am able to switch the gcloud user to my personal account and the script runs and submits a job without any error. However, since this exists in a shared VM, I would like to rely on service account authorization instead of my own user account.
I had a similar problem with exactly the same error.
I found that the easiest way to troubleshoot those errors is to go to "Logging" and search for "PERMISSION DENIED" text.
In my case service account was missing permission "storage.buckets.get". Then you would need to find a role that have this permission. You could do that from IAM->Roles. In that view you could filter roles by permission name. It turned out that only following roles have the needed permission:
I added "Storage Legacy Bucket Writer" role to the service account in the bucket and then was able to submit a job.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With