I'm trying to figure out a way to read images from an S3 bucket. Right now, my setup is to mount the bucket using s3fs, and then use a python script with os.walk to go through each individual image and do some manipulation on them using numpy.
However, the output of
os.walk("mnt/")
is nothing! The command does not see any files within the mounted drive, although if I manually find the image
plt.imread("mnt/path/to/file")
I receive the image. I am at my wits end trying to figure this out. Any ideas?
You can do:
s3 = s3fs.S3FileSystem()
for dirpath, dirnames, filename in s3.walk(<your bucket name>):
# care about the how many directories your bucket have
for filename in filenames:
file_path = f'{dirpath}{filepath}'
with s3.open(file_path, 'rb') as f:
# do your numpy stuff with the "f" object
The code above will loop through the entire bucket, and works only if you have the file in the root of the bucket, if you have directories before, add an if statement, example:
if dirpath.split('/') == <depth of the directory with the files>:
A mounted bucket from S3 doesn't behave like a normal file/directory in your filesystem, so statements like os.walk won't work as you'd expect. Your best bet is to use a library to search and interface with your S3 bucket from within Python itself.
I recommend looking into boto, which has a bunch of tools for interfacing with AWS. Also check out the AWS Python SDK.
Boto: https://github.com/boto/boto AWS SDK for Python: https://aws.amazon.com/sdk-for-python/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With