I'm trying to read a s3 bucket from Spark and up until today Spark always complain that the request return 403
hadoopConf = spark_context._jsc.hadoopConfiguration()
hadoopConf.set("fs.s3a.access.key", "ACCESSKEY")
hadoopConf.set("fs.s3a.secret.key", "SECRETKEY")
hadoopConf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
logs = spark_context.textFile("s3a://mybucket/logs/*)
Spark was saying .... Invalid Access key [ACCESSKEY]
However with the same ACCESSKEY and SECRETKEY this was working with aws-cli
aws s3 ls mybucket/logs/
and in python boto3 this was working
resource = boto3.resource("s3", region_name="us-east-1")
resource.Object("mybucket", "logs/text.py") \
            .put(Body=open("text.py", "rb"),ContentType="text/x-py")
so my credentials ARE invalid and the problem is definitely something with Spark..
Today I decided to turn on the "DEBUG" log for the entire spark and to my suprise... Spark is NOT using the [SECRETKEY] I have provided but instead... add a random one???
17/03/08 10:40:04 DEBUG request: Sending Request: HEAD https://mybucket.s3.amazonaws.com / Headers: (Authorization: AWS ACCESSKEY:[RANDON-SECRET-KEY], User-Agent: aws-sdk-java/1.7.4 Mac_OS_X/10.11.6 Java_HotSpot(TM)_64-Bit_Server_VM/25.65-b01/1.8.0_65, Date: Wed, 08 Mar 2017 10:40:04 GMT, Content-Type: application/x-www-form-urlencoded; charset=utf-8, )
This is why it still return 403! Spark is not using the key I provide with fs.s3a.secret.key but instead invent a random one??
For the record I'm running this locally on my machine (OSX) with this command
spark-submit --packages com.amazonaws:aws-java-sdk-pom:1.11.98,org.apache.hadoop:hadoop-aws:2.7.3 test.py
Could some one enlighten me on this?
spark. read. text() method is used to read a text file from S3 into DataFrame. like in RDD, we can also use this method to read multiple files at a time, reading patterns matching files and finally reading all files from a directory.
AWS_SESSION_TOKEN - environment variableSpecifies an AWS session token used as part of the credentials to authenticate the user. A session token is required only if you manually specify temporary security credentials.
Sign in to the AWS Management Console and open the IAM console at https://console.aws.amazon.com/iam/ . In the navigation pane, choose Users. Choose the name of the user whose access keys you want to create, and then choose the Security credentials tab. In the Access keys section, choose Create access key.
To get your access key ID and secret access keyOpen the IAM console at https://console.aws.amazon.com/iam/ . On the navigation menu, choose Users. Choose your IAM user name (not the check box). Open the Security credentials tab, and then choose Create access key.
(updated as my original one was downvoted as clearly considered unacceptable)
The AWS auth protocol doesn't send your secret over the wire. It signs the message. That's why what you see isn't what you passed in.
For further information, please reread.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With