Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to configure Secor (from pinterest) to partition files by hour of day?

Looking for some insight on how to configure Secor to output fatter files that are partitioned by datetime rather than kafka offset. Something akin to hourly backups of kafka topic streams. Currently, my common.properties file contains these secor configs:

secor.generation=1
secor.consumer.threads=7
secor.messages.per.second=10000
secor.offsets.per.partition=10000000
secor.topic_partition.forget.seconds=600
secor.local.log.delete.age.hours=-1
    secor.file.reader.writer.factory=com.pinterest.secor.io.impl.SequenceFileReaderWriterFactory
secor.max.message.size.bytes=100000

This file mentions that a partition could describe the date of a message:

LogFilePath.java:

(line 29) Log file path has the following form: prefix/topic/partition1/.../partitionN/generation_kafkaParition_firstMessageOffset

(line 34) "partition1, ..., partitionN is the list of partition names extracted from message content. * E.g., the partition may describe the message date such as dt=2014-01-01 [...]"

like image 356
Mulloy Avatar asked Dec 07 '25 03:12

Mulloy


1 Answers

Secor's Readme File: JSON date parser: parser that extracts timestamps from JSON messages and groups the output based on the date, similar to the Thrift parser above. To use this parser, start Secor with properties file secor.prod.partition.properties and set secor.message.parser.class=com.pinterest.secor.parser.JsonMessageParser. You may override the field used to extract the timestamp by setting the message.timestamp.name property.

like image 92
Mulloy Avatar answered Dec 11 '25 19:12

Mulloy



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!