I am trying to understand flush.size
and rotate.interval.ms
configuration for S3 connector in depth. I deployed S3 connector and I seem to have file sizes ranging from 6 kb
all the way to 30 mb
wondering if anyone here can help me with suggestions on how to get almost equal file sizes.
Here are my settings: flush.size= 200000
, rotate.interval.ms=10min
We tried rolling our own connector as well based on an example in this git https://github.com/canelmas/kafka-connect-field-and-time-partitioner still we can't get the file sizes to be around the same size.
The S3 Sink Connector writes data to the partition path per Kafka partition and partition path defined by partitione.class.
Basically, S3 Connector flush buffers into the below condition.
Note: This helpful clear backlog data lets assume rotate.interval.ms and we have 6 hours delay data then so every timestamp passed 10 minute flush will get delay in a few second in contrary if data not flowing it will wait to receive next rotate.interval.ms passed
In case of Time Based Partitioner
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With