Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

apache fink 0.10 Filtering duplicates over an infinite stream with time window purge

how can i filter out duplicates over an infinite stream with a time window purge? I dont have infinite space / ram and i know that after say, 2 seconds (on the local clock), any duplicate that can occur will have occured. which means that after 2 seconds i can throw away (purge) the old data.

Filtering duplicates over an infinite stream with time window purge.

I got an excellent answer to how to remove the duplicates here in this question (big thanks to Till): apache flink 0.10 how to get the first occurence of a composite key from an unbounded input dataStream?

but i dont know how to tell flink to throw away the old data after 2 seconds (local time).

how can i do this with flink 0.10 please?

Thanks alot!!!

here is the statement which removes duplicates but doesnt purge:

input.keyBy(0, 1).flatMap(new DuplicateFilter()).print();

if I add .timeWindow(Time.minutes(1), Time.seconds(30)) after keyBy(0, 1) its not compilable.

like image 988
timmy_stapler Avatar asked Dec 22 '25 04:12

timmy_stapler


1 Answers

Thanks to Till - the answer is given in an update to the following link: apache flink 0.10 how to get the first occurence of a composite key from an unbounded input dataStream?

see the update.

like image 93
timmy_stapler Avatar answered Dec 24 '25 09:12

timmy_stapler



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!