Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get last modified date of latest file from S3 with Boto Python?

This is structure of my s3 bucket

Bucket 1
    Company A
       File A-02/01/20
       File A-01/01/20
       File B-02/01/20
       File B-01/01/20

    Company B
       File A-02/01/20
       File A-01/01/20

I am trying to go to Bucket 1 >> navigate to company A FOLDER and find the latest version of File A and print the modified date, I wanted to do repeat the same steps for File B and then Company B Folder/File A. I am new to S3 and Boto3 so still learning. This is what my code is so far

import boto3
from datetime import datetime, timezone

today = datetime.now(timezone.utc)

s3 = boto3.client('s3', region_name='us-east-1')

objects = s3.list_objects(Bucket='Bucket 1',Prefix = 'Company A'+'/File')

for o in objects["Contents"]:
    if o["LastModified"] != today:
        print(o["Key"] +" "+ str(o["LastModified"]))

This prints out the following:

File A_2019-10-28.csv 2019-11-11 18:31:17+00:00 
File A_2020-01-14.csv 2020-01-14 21:17:46+00:00 
File A_2020-01-28.csv 2020-01-29 19:19:58+00:00

But all I want is check File A_2020-01-28.csv and print if !=today, the same with File B

like image 308
Ronron Avatar asked Nov 16 '25 04:11

Ronron


2 Answers

Assuming that "File A" will always have a date at the end, you could use the 'A' part in the Prefix search. One thing to keep in mind with S3 is that there is no such thing as folders. That is something you imply by using '/' in they key name. S3 just works on Buckets/Keys.

The latest version of that file would be the the version that has the newest last_modified field. One approach is to sort the object list (of "A" files) on that attribute:

import boto3
from operator import attrgetter

s3 = boto3.client('s3', region_name='us-east-1')
objs = s3.Bucket('Bucket 1').objects.filter(Prefix='Company A/File A')

# sort the objects based on 'obj.last_modified'
sorted_objs = sorted(objs, key=attrgetter('last_modified'))

# The latest version of the file (the last one in the list)
latest = sorted_objs.pop()

As an example: I created foo1.txt, foo2.txt, foo3.txt in order. Then foo10.txt, foo5.txt. foo5.txt is my latest "foo" file.

>>> b.upload_file('/var/tmp/foo.txt','foo10.txt')
>>> b.upload_file('/var/tmp/foo.txt','foo5.txt')
>>> [i.key for i in b.objects.all()]  ## no ordering
['foo.txt', 'foo10.txt', 'foo2.txt', 'foo3.txt', 'foo5.txt']
>>> f2 = sorted(b.objects.all(), key=attrgetter('last_modified'))
>>> f2
[s3.ObjectSummary(bucket_name='foobar', key='foo.txt'), s3.ObjectSummary(bucket_name='foobar', key='foo2.txt'), s3.ObjectSummary(bucket_name='foobar', key='foo3.txt'), s3.ObjectSummary(bucket_name='foobar', key='foo10.txt'), s3.ObjectSummary(bucket_name='foobar', key='foo5.txt')]
>>> f2.pop()
s3.ObjectSummary(bucket_name='foobar', key='foo5.txt')

For more details on Python sorting see: https://wiki.python.org/moin/HowTo/Sorting

like image 107
Steven Graham Avatar answered Nov 17 '25 17:11

Steven Graham


Almost there, however the if statement compares 2 different datetime objects which contain date AND time - the time will differ. If you are after the dates only then change the if to:

    if o["LastModified"].date() != today.date():

Works on Python 3.6.9.

like image 36
Raf Avatar answered Nov 17 '25 17:11

Raf



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!