Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Alternative to create more than 100 partitions on Athena CTAS

I'm currently creating some new tables from information stored in Amazon S3. First time using AWS, today I learn that Amazon Athena can't create more than 100 partitions from a CTAS query.

I'm doing the transformations using sql, it works perfectly, but need a way to store more than 100 partitions at once to make the process more reliable.

I'm setting the partition on date, so in 4 months my process is going to fail if I need to recreate the table to load a large amount of data via sql (where I have the transformations).

Any idea of how can I achieve this?

like image 352
Alejandro Avatar asked Sep 14 '25 09:09

Alejandro


1 Answers

The best option would be to write a Glue ETL (spark) job for this task and use spark sql to perform the required transformations. That way you still get to use your existing sql queries.

Then you can write the processed output back to some S3 path. Spark allows you create as many partitions as you want. Also it allows to append the newly processed data to already processed data, there by allowing you to load and transform only the new data.

Once the ETL is done, create an external table pointing to the above used S3 path and required partitions. This will be one time step (creating external table). You will only need to update the partition information in this external table after every glue job.

In summary, you need to do the following :

  • Create a spark script to be executed on Glue ETL which will read daily source data, apply required transformations and write the processed data on S3 in a new partition. This script can be easily tampletized for accepting date as input and will be one time activity.

  • Create an external table pointing to the processed data on S3. This will also be one time activity.

  • Execute MSCK Repair command on above external table after every Glue ETL job to update the new partition.

References :

AWS Glue ETL documentation

AWS Athena - Create external table

AWS Athena - Update partiotion

like image 165
Harsh Bafna Avatar answered Sep 16 '25 00:09

Harsh Bafna