Please excuse my ignorance / lack of knowledge in this area!
I'm looking to upload a dataframe to S3, but I need to pass 'ACL':'bucket-owner-full-control'.
import pandas as pd
import s3fs
fs = s3fs.S3FileSystem(anon=False, s3_additional_kwargs={'ACL': 'bucket-owner-full-control'})
df = pd.DataFrame()
df['test'] = [1,2,3]
df.head()
df.to_parquet('s3://path/to/file/df.parquet', compression='gzip')
I have managed to get around this by then loading this to a Pyarrow table and the loading like:
import pyarrow.parquet as pq
table = pa.Table.from_pandas(df)
pq.write_to_dataset(table=table,
root_path='s3://path/to/file/',
filesystem=fs)
But this feels hacky and I feel there must be a way to pass the ACL in the first example.
With Pandas 1.2.0, there is storage_options
as mentioned here.
If you are stuck with Pandas < 1.2.0 (1.1.3 in my case), this trick did help:
storage_options = dict(anon=False, s3_additional_kwargs=dict(ACL="bucket-owner-full-control"))
import s3fs
fs = s3fs.S3FileSystem(**storage_options)
df.to_parquet('s3://foo/bar.parquet', filesystem=fs)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With