Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

read a zipped csv from S3 into python dataframe

I Have a bucket in S3 with a csv in it.
There are no none-ASCII characters in it.
when I try to read it using python it will not let me.
I used: df = self.s3_input_bucket.get_file_contents_from_s3(path)
as I used on many occasions recently in the same script, and get: UnicodeDecodeError: 'utf8' codec can't decode byte 0x84 in position 14: invalid start byte.
to make sure it goes to the right path, i put another plain text file in the same folder and was able to read it without a problem.

I tried many solutions I found on other questions. just one example, I saw a solution someone offered, to try this:

str = unicode(str, errors='replace')

or

str = unicode(str, errors='ignore')
from this question: UnicodeDecodeError: 'utf8' codec can't decode byte 0x9c
but how can I use them in this case?
this did not work:

str = unicode(self.s3_input_bucket.get_file_contents_from_s3(path), errors='replace')

like image 472
Zusman Avatar asked Oct 11 '25 17:10

Zusman


1 Answers

Apparently, I tried to open a zipped filed.
after much research, I was able to read it into a data frame using this code:

import zipfile
import s3fs
s3_fs = s3fs.S3FileSystem(s3_additional_kwargs={'ServerSideEncryption': 'AES256'})

market_score = self._zipped_csv_from_s3_to_df(os.path.join(my-bucket, path-in-bucket), s3_fs)

def _zipped_csv_from_s3_to_df(self, path, s3_fs):
    with s3_fs.open(path) as zipped_dir:
            with zipfile.ZipFile(zipped_dir, mode='r') as zipped_content:
                for score_file in zipped_content.namelist():
                    with zipped_content.open(score_file) as scores:
                        return pd.read_csv(scores)

I will always have only one csv file inside the zip, so that is why I know I can return on the first iteration.
however this function iterate over the files in the zip.


Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!