Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Failed to lock the state directory for task 0_13

I am facing a very weird issue with Kafka streams, under heavy load when a rebalancing happens my kafka streams application keep getting stuck with the following error showing up in logs repeatedly:

org.apache.kafka.streams.errors.LockException: stream-thread [metricsvc-metric-space-aggregation-9f4389a2-85de-43dc-a45c-3d4cc66150c4-StreamThread-1] task [0_13] Failed to lock the state directory for task 0_13
    at org.apache.kafka.streams.processor.internals.StateManagerUtil.registerStateStores(StateManagerUtil.java:91) ~[kafka-streams-2.8.1.jar:?]
    at org.apache.kafka.streams.processor.internals.StreamTask.initializeIfNeeded(StreamTask.java:216) ~[kafka-streams-2.8.1.jar:?]
    at org.apache.kafka.streams.processor.internals.TaskManager.tryToCompleteRestoration(TaskManager.java:433) ~[kafka-streams-2.8.1.jar:?]
    at org.apache.kafka.streams.processor.internals.StreamThread.initializeAndRestorePhase(StreamThread.java:849) ~[kafka-streams-2.8.1.jar:?]
    at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:731) ~[kafka-streams-2.8.1.jar:?]
    at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:583) ~[kafka-streams-2.8.1.jar:?]
    at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:556) ~[kafka-streams-2.8.1.jar:?]

I am debugging some old code written by a developer in our org who is no longer with our company and this part is running into some issues. Unfortunately the code is not very well documented. In this part of the code he has tried to override some of the kafka streams WindowedStore and ReadOnlyWindowedStore classes for optimazation. I understand it is quite difficult to find the root cause without looking at the complete code but is there something really obvious that I should be looking at to solve this?

I am currently running 4 kubernetes pods for this service and all of them have their independent state directory.

I am expecting to not get the error above and even if it happens kafka streams should recover from this error gracefully, but it doesn't happen in our case.

like image 254
birinder tiwana Avatar asked Oct 31 '25 06:10

birinder tiwana


1 Answers

Are there multiple StreamThread instances per POD? Then you could be affected by https://issues.apache.org/jira/browse/KAFKA-12679

like image 154
CptHindsight Avatar answered Nov 03 '25 00:11

CptHindsight



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!