I am wanting to get the maximum value from a partition of my Athena table. Given that the volume of scanned data is cost, am seeking a way to do this with minimum scan.
Admittedly, I have little data in there now but will grow over time once in production.
Does anyone know about what happens under the hood for these 2 approaches, how they differ, and which would be the most efficient?
Thanks
Method (1)
SELECT max(dt) 
FROM mydb.mytable 
-- Console Output: -- Time in queue:0.166 sec Run time:3.153 sec Data scanned:-
Method (2)
SELECT max(dt) 
FROM mydb."mytable$partitions" 
-- Console Output: -- Time in queue:0.223 sec Run time:1.347 sec Data scanned:0.02 KB
Very very very late answer, but this question helped me a lot so I look it up, maybe it can help others:
SHOW PARTITIONS lists the partitions in metadata.
If you want to execute a SHOW PARTITIONS on a query you use:
SELECT * FROM "table_name$partitions"
The second example you posted it's faster because it doesn't look into the filesystem (S3) but only into the metadata.
AWS Documentation: https://docs.aws.amazon.com/athena/latest/ug/show-partitions.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With