Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Athena to query parquet files in s3 infrequent access: how much does it cost?

When I scan a parquet files that is located in s3 using Athena, then Athena bills me for how much data it scans. Because parquet is a columnar format, queries that touch just a few columns of wide tables end up scanning only a small portion of the parquet files, leading to cost savings on the query side. If the parquet files are stored in S3's standard access tier, then there is no additional cost for accessing the data from s3.

What if the data is stored in S3's infrequent access tier (IA)? If Athena scans small portions of a parquet file stored in S3, do I

  1. Pay the Infrequent Access Data retrieval fee for only the number of bytes that Athena scans?, or
  2. Pay the Infrequent Access Data retrieval fee for the size of the entire parquet file, because I get charged for accessing the entire file if I touch it at all?
like image 589
conradlee Avatar asked Oct 20 '25 15:10

conradlee


1 Answers

Based on theAmazon S3 Simple Storage Service pricing, it would seem that Infrequent Access has these relevant charges:

  • GET, SELECT, and all other requests (per 1,000 requests): $0.001 (compared to $0.0004)
  • Data retrievals (per GB): $0.01 (compared to $0.00)

My reading is that the Data Retrieval would be for the amount of data 'retrieved' from S3, which would likely be ranged GETs from Athena. However, I have no specific information that says this is the way it would be charged.

Athena would likely 'jump around' the file a bit due to the columnar storage, which would also cause charges for GET requests.

For normal access to Infrequent Access files, IA is cheaper if the object is accessed less than once per month. Parquet usage would probably improve this equation. The only way to be sure would be to setup a test on a bucket & region you don't normally use (or a different account), do some access and then see what charges come through.

like image 137
John Rotenstein Avatar answered Oct 22 '25 06:10

John Rotenstein