We are in the process of designing a Kafka Cluster (at least 3 nodes) that will process events from an array of web servers. Since the logs are largely identical, we are planning to create a single Topic only (say - webevents)
We expect a lot of traffic from the servers. Since there is a single topic, there will be a single leader broker. In such a case how will the cluster balance the high traffic? All write requests will always be routed to the leader broker at all times and other nodes might be underutilized.
Does a external hardware balancer help solve this problem? Alternately, can a Kafka configuration help distribute write requests evenly on a 1-topic cluster?
Thanks, Sharod
Short answer: a topic may have multiple partitions and each partition, not topic, has a leader. Leaders are evenly distributed among brokers. So, if you have multiple partitions in your topic you will have multiple leaders and your writes will be evenly distributed among brokers.
You will have a single topic with lot of partitions, you can replicate partitions for high availability/durability of your data.
Each broker will hold an evenly distributed number of partitions and each of these partitions can be either a leader or a replica for a topic. Kafka producers (Kafka clients running in your web servers in your case) write to a single leader, this provides a means of load balancing production so that each write can be serviced by a separate broker and machine.
Producers do the load balancing selecting the target partition for each message. It can be done based on the message key, so all messages with same key go to the same partition, or on a round-robin fashion if you don't set a message key.

Take a look at this nice post. I took the diagram from there.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With