Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mystery about Kafka's retention period

Tags:

apache-kafka

I have a topic configured on our production cluster, it has a retention period of 432000000 ms i.e 5 days. But it is usually holding earliest messages containing timestamps of 10 days ago! For example, today on 22nd March I checked data in that topic using console consumer command. First record was having timestamp of 12th March. This data went in the topic at nearly the same time when it was generated, so there is no difference between timestamp in the log and actual time when it got queued up. So how can this happen that Kafka is storing messages well past the configured retention period?

like image 959
Shades88 Avatar asked Nov 01 '25 07:11

Shades88


1 Answers

The retention settings are lower bound limits.

In your example it means Kafka will not delete any messages that are less than 5 days old.

The logs on disk are split up in several segments. Kafka only performs deletion on full segments and does not touch the latest (active) segment. So in order for a segment to be deleted, the last message in it has to be older than 5 days and it must not be the latest segment.

By default, Kafka only rolls new segments if they are older than 7 days (log.roll.hours=168) or if they reach their max size (log.segment.bytes=1GB).

So it looks like you've not produced enough data to roll a new segment because of size, so I suggest to reduce log.roll.hours to force new segments to be created more frequently.

like image 69
Mickael Maison Avatar answered Nov 03 '25 22:11

Mickael Maison



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!