Athena/Presto: Getting maximum partition value, at cheapest scan cost

Question

I am wanting to get the maximum value from a partition of my Athena table. Given that the volume of scanned data is cost, am seeking a way to do this with minimum scan.

Admittedly, I have little data in there now but will grow over time once in production.

Does anyone know about what happens under the hood for these 2 approaches, how they differ, and which would be the most efficient?

Thanks

Method (1)

SELECT max(dt) 
FROM mydb.mytable

-- Console Output: -- Time in queue:0.166 sec Run time:3.153 sec Data scanned:-

Method (2)

SELECT max(dt) 
FROM mydb."mytable$partitions"

-- Console Output: -- Time in queue:0.223 sec Run time:1.347 sec Data scanned:0.02 KB

Alejandro · Accepted Answer

Very very very late answer, but this question helped me a lot so I look it up, maybe it can help others:

SHOW PARTITIONS lists the partitions in metadata.

If you want to execute a SHOW PARTITIONS on a query you use:

SELECT * FROM "table_name$partitions"

The second example you posted it's faster because it doesn't look into the filesystem (S3) but only into the metadata.

AWS Documentation: https://docs.aws.amazon.com/athena/latest/ug/show-partitions.html

Athena/Presto: Getting maximum partition value, at cheapest scan cost

Tags:

amazon-athena

presto

SimonB

1 Answers

Alejandro

Recent Activity

Donate For Us

Athena/Presto: Getting maximum partition value, at cheapest scan cost

Tags:

amazon-athena

presto

SimonB

1 Answers

Alejandro

Related questions

Recent Activity

Donate For Us