The AWS Kinesis stream documentation mentions
Typically, when you use the KCL, you should ensure that the number of instances does not exceed the number of shards
What would be the consequence if the number of instances exceeds the number of shards? I plan on running one worker per Web server (separate thread). So I want to know whether it is required to check and compare the number of shards and running workers when a new web server instance is started. Or can one just start another worker without any side effect if the number of workers exceeds the number of shards.
TL; DR: There can only be one Worker per Shard. Any additional Workers will sit idle.
If you have a Kinesis stream with two shards, and you run an app on a single instance that leverages the KCL, the app will run two workers in separate threads-- one Worker per Shard (per thread).
If you run two instances, your app will run a single Worker on each instance in a thread-- two instances, one worker each; one Kinesis stream, two shards.
Each worker takes out a lease against a shard in a stream so no other worker of the same app can read the same shard. The Worker stores the lease information in Dynamo DB so other Workers can read it.
If you were to run 3 instances in this scenario, one of the instances would sit around waiting for a Worker on one of the other instances to lose its lease. Once one of the other Workers loses its lease, the third Worker could pick up the stream and begin processing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With