Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache Flink State Store vs Kafka Streams

As far as I know handles Kafka Streams its States localy in memory or on disc or in a Kafka topic because all the input date is from a partition, where all the messages are keyed by a defined value. Most of the time the computations can be done without knowing the state of other Processors. If so, you have another Streams instance whichs calculsates the result. Like in this picture:

enter image description here

Where exactly does Flink store its States? Can Flink also store the states locally or does it always publish them always to all instances (tasks)? Is it possible to configure Flink so that it stores the States in a Kafka Broker?

like image 863
str0yd Avatar asked Oct 27 '25 07:10

str0yd


1 Answers

Flink also uses local stores (that can be keyed), similar to Kafka Streams. However, it does not write state into Kafka topics.

For fault-tolerance, it takes so-called "distributed snapshots", that are stored in a configurable state backend (eg, HDFS).

Check out the docs for more details:

  • https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/checkpoints.html
  • https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/state/checkpointing.html
  • https://ci.apache.org/projects/flink/flink-docs-stable/internals/stream_checkpointing.html
  • https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/state/state_backends.html
like image 197
Matthias J. Sax Avatar answered Oct 29 '25 08:10

Matthias J. Sax