Pyspark not using TemporaryAWSCredentialsProvider

Question

I'm trying to read files from S3 using Pyspark using temporary session credentials but keep getting the error:

Received error response: com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: null, AWS Request ID: XXXXXXXX, AWS Error Code: null, AWS Error Message: Forbidden, S3 Extended Request ID: XXXXXXX

I think the issue might be that the S3A connection needs to use org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider in order to pull in the session token in addition to the standard access key and secret key, but even with setting the fs.s3a.aws.credentials.provider configuration variable, it is still attempting to authenticate with BasicAWSCredentialsProvider. Looking at the logs I see:

DEBUG AWSCredentialsProviderChain:105 - Loading credentials from BasicAWSCredentialsProvider

I've followed the directions here to add the necessary configuration values, but they do not seem to make any difference. Here is the code I'm using to set it up:

import os
import sys
import pyspark
from pyspark.sql import SQLContext
from pyspark.context import SparkContext

os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.amazonaws:aws-java-sdk-pom:1.11.83,org.apache.hadoop:hadoop-aws:2.7.3 pyspark-shell'

sc = SparkContext()
sc.setLogLevel("DEBUG")
sc._jsc.hadoopConfiguration().set("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
sc._jsc.hadoopConfiguration().set("fs.s3a.access.key", os.environ.get("AWS_ACCESS_KEY_ID"))
sc._jsc.hadoopConfiguration().set("fs.s3a.secret.key", os.environ.get("AWS_SECRET_ACCESS_KEY"))
sc._jsc.hadoopConfiguration().set("fs.s3a.session.token", os.environ.get("AWS_SESSION_TOKEN"))
sql_context = SQLContext(sc)

Why is TemporaryAWSCredentialsProvider not being used?

wafle · Accepted Answer

Which Hadoop version are you using?

S3A STS support was added in Hadoop 2.8.0, and this was the exact error message i got on Hadoop 2.7.

Pyspark not using TemporaryAWSCredentialsProvider

Tags:

amazon-s3

pyspark

Alec Peters

1 Answers

wafle

Recent Activity

Donate For Us

Pyspark not using TemporaryAWSCredentialsProvider

Tags:

amazon-s3

pyspark

Alec Peters

1 Answers

wafle

Related questions

Recent Activity

Donate For Us