I've got a list of keys that I'm retrieving from a cache, and I want to download the associated objects (files) from S3 without having to make a request per key.
Assuming I have the following array of keys:
key_array = [
    '20160901_0750_7c05da39_INCIDENT_MANIFEST.json',
    '20161207_230312_ZX1G222ZS3_INCIDENT_MANIFEST.json',
    '20161211_131407_ZX1G222ZS3_INCIDENT_MANIFEST.json',
    '20161211_145342_ZX1G222ZS3_INCIDENT_MANIFEST.json',
    '20161211_170600_FA68T0303607_INCIDENT_MANIFEST.json'
]
I'm trying to do something similar to this answer on another SO question, but modified like so:
import boto3
s3 = boto3.resource('s3')
incidents = s3.Bucket(my_incident_bucket).objects(key_array)
for incident in incidents:
    # Do fun stuff with the incident body
    incident_body = incident['Body'].read().decode('utf-8')
My ultimate goal being that I'd like to avoid hitting the AWS API separately for every key in the list. I'd also like to avoid having to pull the whole bucket down and filtering/iterating the full results.
I think the best you are going to get is n API calls where n is the number of keys in your key_array. The amazon API for s3 doesn't offer much in the way of server-side filtering based on keys, other than prefixes. Here is the code to get it in n API calls:
import boto3
s3 = boto3.client('s3')
for key in key_array:
    incident_body = s3.get_object(Bucket="my_incident_bucket", Key=key)['Body']
    # Do fun stuff with the incident body
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With