Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Athena update only specific partition : MSCK REPAIR TABLE

I have an external table that has data partitioned by date. The data gets updated everyday for new set of files for that day. This is how i execute the job in airflow.

  1. Get the file. This gets the file like dt=2018-06-20 on S3.
  2. Create external table pointing to the S3 location partition by dt.
  3. Run MSCK REPAIR TABLE commmand to update the partition.

Is there a way to call the above command to operate only on the new file that got added for the current day so basically if i get a file for dt=2018-06-21, I can update only that partition.

Thanks!

like image 260
Yu Ni Avatar asked Sep 07 '25 16:09

Yu Ni


1 Answers

You can add partitions manually - that's an example from Athena manual:

    ALTER TABLE orders ADD
      PARTITION (dt = '2016-05-14', country = 'IN') LOCATION 's3://mystorage/path/to/INDIA_14_May_2016'
      PARTITION (dt = '2016-05-15', country = 'IN') LOCATION 's3://mystorage/path/to/INDIA_15_May_2016';
like image 150
botchniaque Avatar answered Sep 11 '25 04:09

botchniaque