Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache flink with S3 as source and S3 as sink

Is it possible to read events as they land in S3 source bucket via apache Flink and process and sink it back to some other S3 bucket? Is there a special connector for that , or I have to use the available read/save examples mentioned in Apache Flink? How does the checkpointing happen in such case, does flink keep track of what it has read from S3 source bucket automatically, or does it need custom code to be built. Does flink also guarentee exactly once processing in S3 source case.

like image 381
sc so Avatar asked Oct 26 '25 00:10

sc so


1 Answers

In Flink 1.11 the FileSystem SQL Connector is much improved; that will be an excellent solution for this use case.

With the DataStream API you can use FileProcessingMode.PROCESS_CONTINUOUSLY with readFile to monitor a bucket and ingest new files as they are atomically moved into it. Flink keeps track of the last-modified timestamp of the bucket, and ingests any children modified since that timestamp -- doing so in an exactly-once way (the read offsets into those files are included in checkpoints).

like image 141
David Anderson Avatar answered Oct 28 '25 04:10

David Anderson



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!