how can i filter out duplicates over an infinite stream with a time window purge? I dont have infinite space / ram and i know that after say, 2 seconds (on the local clock), any duplicate that can occur will have occured. which means that after 2 seconds i can throw away (purge) the old data.
Filtering duplicates over an infinite stream with time window purge.
I got an excellent answer to how to remove the duplicates here in this question (big thanks to Till): apache flink 0.10 how to get the first occurence of a composite key from an unbounded input dataStream?
but i dont know how to tell flink to throw away the old data after 2 seconds (local time).
how can i do this with flink 0.10 please?
Thanks alot!!!
here is the statement which removes duplicates but doesnt purge:
input.keyBy(0, 1).flatMap(new DuplicateFilter()).print();
if I add .timeWindow(Time.minutes(1), Time.seconds(30)) after keyBy(0, 1) its not compilable.
Thanks to Till - the answer is given in an update to the following link: apache flink 0.10 how to get the first occurence of a composite key from an unbounded input dataStream?
see the update.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With