Watermarking enables automatic dropping of old state data in Apache Spark Structured Streaming. In structured-streaming-programming-guide.md, word count example demonstrates how watermarking can easily drop the records or events that arrive late in the system. ( https://github.com/apache/spark/blob/master/docs/structured-streaming-programming-guide.md )
words.withWatermark("timestamp", "10 minutes")
Is there a way to save the records that are dropped or discarded by watermarking on a disk or in a table?
Yes,spark doesn't have the function to trace these records.But flink does it !
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With