I have a Python lambda script that shrinks images as they are uploaded to S3. When the uploaded filename contains non-ASCII characters (Hebrew in my case), I cannot get the object (Forbidden as if the file doesn't exist).
Here's (some of) my code:
s3_client = boto3.client('s3')
def handler(event, context):
for record in event['Records']:
bucket = record['s3']['bucket']['name']
key = record['s3']['object']['key']
s3_client.download_file(bucket, key, "/tmp/somefile")
This raises An error occurred (403) when calling the HeadObject operation: Forbidden: ClientError. I also see in the log that the key contains characters like %D7%92.
Following the web I also tried to unquote the key according to some sources (http://blog.rackspace.com/the-devnull-s3-bucket-hacking-with-aws-lambda-and-python/) like so, with no luck:
key = urllib.unquote_plus(record['s3']['object']['key'])
Same error, although this time the log states that I'm trying to retrieve a key with characters like this: פ×קס×.
Note that this script is verified to work on English keys, and the tests were done on keys with no spaces.
#This worked for me
import urllib.parse
encodedStr = 'My+name+is+Tarak'
urllib.parse.unquote_plus(encodedStr)
"My name is Tarak"
I had a similar problem. I solved it adding an encode before doing the unquote:
key = urllib.unquote_plus(event['Records'][0]['s3']['object']['key'].encode("utf8"))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With