Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pyspark not using TemporaryAWSCredentialsProvider

I'm trying to read files from S3 using Pyspark using temporary session credentials but keep getting the error:

Received error response: com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 403, AWS Service: null, AWS Request ID: XXXXXXXX, AWS Error Code: null, AWS Error Message: Forbidden, S3 Extended Request ID: XXXXXXX

I think the issue might be that the S3A connection needs to use org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider in order to pull in the session token in addition to the standard access key and secret key, but even with setting the fs.s3a.aws.credentials.provider configuration variable, it is still attempting to authenticate with BasicAWSCredentialsProvider. Looking at the logs I see:

DEBUG AWSCredentialsProviderChain:105 - Loading credentials from BasicAWSCredentialsProvider

I've followed the directions here to add the necessary configuration values, but they do not seem to make any difference. Here is the code I'm using to set it up:

import os
import sys
import pyspark
from pyspark.sql import SQLContext
from pyspark.context import SparkContext

os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.amazonaws:aws-java-sdk-pom:1.11.83,org.apache.hadoop:hadoop-aws:2.7.3 pyspark-shell'

sc = SparkContext()
sc.setLogLevel("DEBUG")
sc._jsc.hadoopConfiguration().set("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider")
sc._jsc.hadoopConfiguration().set("fs.s3a.access.key", os.environ.get("AWS_ACCESS_KEY_ID"))
sc._jsc.hadoopConfiguration().set("fs.s3a.secret.key", os.environ.get("AWS_SECRET_ACCESS_KEY"))
sc._jsc.hadoopConfiguration().set("fs.s3a.session.token", os.environ.get("AWS_SESSION_TOKEN"))
sql_context = SQLContext(sc)

Why is TemporaryAWSCredentialsProvider not being used?

like image 426
Alec Peters Avatar asked Sep 02 '25 04:09

Alec Peters


1 Answers

Which Hadoop version are you using?

S3A STS support was added in Hadoop 2.8.0, and this was the exact error message i got on Hadoop 2.7.

like image 150
wafle Avatar answered Sep 04 '25 23:09

wafle