Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it safe to run VACUUM and DELETE against a Delta Table while there's a Spark Streaming query doing data ingestion

I've got a 24/7 Spark Structured Streaming query (Kafka as a source) that appends data to a Delta Table.

Is it safe to periodically run VACUUM and DELETE against the same Delta Table from a different cluster while the first one is still processing incoming data ?

The table is partitioned on date and the DELETE will be done at partition level.

p.s. the infrastructure is on top of AWS.

like image 848
unvadim Avatar asked Oct 21 '25 01:10

unvadim


1 Answers

If your streaming job is really append-only, then it shouldn't have any conflicts:

  • DELETE on the partition level can't conflict in WriteSerializable isolation level (default) if the write happens without reading (i.e. append-only workload)
  • VACUUM simply removes files that aren't referenced in the latest version so it won't conflict with appends.
like image 121
Alex Ott Avatar answered Oct 27 '25 06:10

Alex Ott



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!