I have a pipeline like this:
env.addSource(kafkaConsumer, name_source)
.keyBy { value -> value.f0 }
.window(EventTimeSessionWindows.withGap(Time.seconds(2)))
.process(MyProcessor())
.addSink(kafkaProducer)
The keys are guaranteed to be unique in the data that is being currently processed. Thus I would expect the state size to not grow over 2 seconds of data.
However, I notice the state size has been steadily growing over the last day (since the app was deployed).
Is this a bug in flink?
using flink 1.11.2 in aws kinesis data analytics.
Kinesis Data Analytics always uses RocksDB as its state backend. With RocksDB, dead state isn't immediately cleaned up, it's merely marked with a tombstone and is later compacted away. I'm not sure how KDA configures RocksDB compaction, but typically it's done when a level reaches a certain size -- and I suspect your state size is still small enough that compaction hasn't occurred.
With incremental checkpoints (which is what KDA does), checkpointing is done by copying RocksDB's SST files -- which in your case are presumably full of stale data. If you let this run long enough you should eventually see a significant drop in checkpoint size, once compaction has been done.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With