When I am sending messages to Kafka topic, I might get a single message which is much larger in size compared to other messages.
So it is required to compress at single message level. As per the https://cwiki.apache.org/confluence/display/KAFKA/Compression,
A set of messages can be compressed and represented as one compressed message.
Also as per the description given here https://github.com/apache/kafka/blob/0.10.1/clients/src/main/java/org/apache/kafka/clients/producer/ProducerConfig.java for the property compression.type,
Compression is of full batches of data, so the efficacy of batching will also impact the compression ratio (more batching means better compression).;
Shall I put batch size as one/disable batching to make the compression at each message level?
compression is orthogonal to the question of producing in batch or not. Though, as stated in the documentation:
more batching means better compression
Compression can be set in the topic level (https://kafka.apache.org/documentation/#topicconfigs) or as part of producer config (https://kafka.apache.org/documentation/#producerconfigs) . Moreover, different messages in the same topic can be compressed with different type, as the compression type is part of the record metadata (https://kafka.apache.org/documentation/#recordbatch), and it would be seamless to the consumer.
However, if you require selectively compress different messages, it cannot be done with the same producer, as the producer configuration is static. Whatever is the motivation for such a choice, you could just create two producer instances (one that support compression and one without compression), and according to message content, decide which producer to use to send it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With