We are using apache beam through airflow. Default GCS account is set with environmental variable - GOOGLE_APPLICATION_CREDENTIALS. We don't want to change environmental variable as it might affect other processes running at that time. I couldn't find a way to change Google Cloud Dataflow Service Account programmatically. We are creating pipeline in following way p = beam.Pipeline(argv=self.conf)
Is there any option through argv or options, where in I can mention the location of gcs credential file? Searched through documentation, but didn't find much information.
You can specify a service account when you launch the job with a basic flag:
--serviceAccount=my-service-account-name@my-project.iam.gserviceaccount.com
That account will need the Dataflow Worker role attached plus whatever else you would like(GCS/BQ/Etc). Details here. You don't need the SA to be stored in GCS, or keys locally to use it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With