When I scan a parquet files that is located in s3 using Athena, then Athena bills me for how much data it scans. Because parquet is a columnar format, queries that touch just a few columns of wide tables end up scanning only a small portion of the parquet files, leading to cost savings on the query side. If the parquet files are stored in S3's standard access tier, then there is no additional cost for accessing the data from s3.
What if the data is stored in S3's infrequent access tier (IA)? If Athena scans small portions of a parquet file stored in S3, do I
Based on theAmazon S3 Simple Storage Service pricing, it would seem that Infrequent Access has these relevant charges:
My reading is that the Data Retrieval would be for the amount of data 'retrieved' from S3, which would likely be ranged GETs from Athena. However, I have no specific information that says this is the way it would be charged.
Athena would likely 'jump around' the file a bit due to the columnar storage, which would also cause charges for GET requests.
For normal access to Infrequent Access files, IA is cheaper if the object is accessed less than once per month. Parquet usage would probably improve this equation. The only way to be sure would be to setup a test on a bucket & region you don't normally use (or a different account), do some access and then see what charges come through.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With