Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete a specific record in a Kafka topic using compaction

Tags:

apache-kafka

I am trying to delete a specific message or record from a Kafka topic. I understand that Kafka was not build to do that. But is it possible to use topic compaction with the ability to replace a record with an empty record using a specific Kafka key? How can this be done?

Thank you

like image 544
CMPE Avatar asked Dec 01 '25 14:12

CMPE


1 Answers

Yes, you could get rid of a particular message if you have a compacted topic.

In that case your message key becomes the identifier. If you then want to delete a particular message you need to send a message with the same key and an empty value to the topic. This is called a tombstone message. Kafka will keep this tombstone around for a configurable amount of time ( so your consumers can deal with the deletion). After this set amount of time, the cleaner thread will remove the tombstone message, and the key will be gone from the partition in Kafka.

In general, please note, that the old (to be deleted) message will not disappear immediately. Depending on the configurations, it could take some time before the replacement of the individual message is happening.

I found this summary on the configurations quite helpful (link to blog)

1) To activate compaction cleanup policy cleanup.policy=compact should be placed

2) The consumer sees all tombstones as long as the consumer reaches head of a log in a period less than the topic config delete.retention.ms (the default is 24 hours).

3) The number of these threads are configurable through log.cleaner.threads config

4) The cleaner thread then chooses the log with the highest dirty ratio. dirty ratio = the number of bytes in the head / total number of bytes in the log(tail + head)

5) Topic config min.compaction.lag.ms gets used to guarantee a minimum period that must pass before a message can be compacted.

6) To set delay to start compacting records after they are written use topic config log.cleaner.min.compaction.lag.ms. Records won’t get compacted until after this period. The setting gives consumers time to get every record.

The log compaction is introduced as

Log compaction ensures that Kafka will always retain at least the last known value for each message key within the log of data for a single topic partition.

Its guarantees are listed here:

Log compaction is handled by the log cleaner, a pool of background threads that recopy log segment files, removing records whose key appears in the head of the log. Each compactor thread works as follows:

1) It chooses the log that has the highest ratio of log head to log tail

2) It creates a succinct summary of the last offset for each key in the head of the log

3) It recopies the log from beginning to end removing keys which have a later occurrence in the log. New, clean segments are swapped into the log immediately so the additional disk space required is just one additional log segment (not a fully copy of the log).

4)The summary of the log head is essentially just a space-compact hash table. It uses exactly 24 bytes per entry. As a result with 8GB of cleaner buffer one cleaner iteration can clean around 366GB of log head (assuming 1k messages).

like image 89
Michael Heil Avatar answered Dec 03 '25 10:12

Michael Heil



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!