Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

kafka cluster expansion generic step

Tags:

apache-kafka

We are planning to expand cluster from 2 node to 8 node. The partition reassignment tool has the option to move topic or partition.

For re-distribution of partitions I am planning to follow the below steps.

Irrespective of number of node additions,If I give all the topics in the topic-to-move.json and all the brokers in the below command then it will give equal distribution of partition among nodes correct ?

 bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --topics-to-move-json-file topics-to-move.json --broker-list "0,1,2,3,4,5,6,7" --generate

After this I am planning to apply the json

--execute --reassignment-json-file generated-json file

Will this cause any problem ?

This step seems to be more general but why it is not documented this way?

like image 763
Knight71 Avatar asked Oct 27 '25 04:10

Knight71


2 Answers

There are few things to be aware of:

  1. Evenly distributing partitions does not necessarily evenly distribute data. Some partitions hold more data than others so you need to look at how much data is in each partition to make a plan to spread the data evenly across the brokers. This is particularly true is you have single partition topics or unevenly balanced keys.
  2. Be "rack aware". If the 8 brokers are in 3 Amazon availability zones or on two different power supplies or network switches in your data center then be careful not to distribute the leader and all it's replicas into the same Rack ID or you lose your high availability.
  3. Consider using replication quotas. When you move lots of data between brokers it can take away network bandwidth from active producers and consumers. Kafka 0.10+ added separate replication quotas (bandwidth throttling) so that you could reduce the bandwidth used during reassignment so it will not negatively impact you live client traffic. Just do t throttle too low or you reassignment might not ever catch up to the new changes coming from producers.
  4. You may want to consider using a third party tool to help to automatically build a reassignment plan. Yahoo!'s Kafka Manager has a reassignment feature (see https://github.com/yahoo/kafka-manager/blob/master/README.md) and Confluent has a 30 day free trial for their Auto Rebalancer that allows both expansion and reduction of broker nodes with rack awareness and throttled reassignment (see http://docs.confluent.io/current/kafka/rebalancer/rebalancer.html)
like image 73
Hans Jespersen Avatar answered Oct 30 '25 14:10

Hans Jespersen


By passing the full topic list to the tool, all your partitions are likely to be reassigned.

In an already large cluster (> 1000s topics) this would cause a lot of unnecessary data copy and leader elections. So typically you would only provide a subset of your topics and only specify the new brokers as destinations to minimize the work required to complete the reassignment.

If your cluster is small enough and without GBs/TBs of data, passing all topics to the reassignment tool should be fine and it's probably the easiest/fastest.

like image 29
Mickael Maison Avatar answered Oct 30 '25 13:10

Mickael Maison