I'm trying to read pyspark DataFrame from Google Cloud Storage, but I keep getting an error that the service account has no storage.objects.create permissions. The account does not have WRITER permissions, but it's just reading parquet files:
spark_session.read.parquet(input_path)
18/12/25 13:12:00 INFO com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl: Repairing batch of 1 missing directories.
18/12/25 13:12:01 ERROR com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl: Failed to repair some missing directories.
com.google.cloud.hadoop.repackaged.gcs.com.google.api.client.googleapis.json.GoogleJsonResponseException: 403 Forbidden
{
"code" : 403,
"errors" : [ {
"domain" : "global",
"message" : "***.gserviceaccount.com does not have storage.objects.create access to ***.",
"reason" : "forbidden"
} ],
"message" : "***.gserviceaccount.com does not have storage.objects.create access to ***."
}
We found the issue. It's due to the implicit auto repair feature in the GCS connector. We disabled this behavior by setting fs.gs.implicit.dir.repair.enable to false.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With