Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to save the records that are dropped by watermarking in spark structured streaming

Watermarking enables automatic dropping of old state data in Apache Spark Structured Streaming. In structured-streaming-programming-guide.md, word count example demonstrates how watermarking can easily drop the records or events that arrive late in the system. ( https://github.com/apache/spark/blob/master/docs/structured-streaming-programming-guide.md )

words.withWatermark("timestamp", "10 minutes")

Is there a way to save the records that are dropped or discarded by watermarking on a disk or in a table?

like image 750
roy bapy Avatar asked Jan 25 '26 00:01

roy bapy


1 Answers

Yes,spark doesn't have the function to trace these records.But flink does it !

like image 128
lec_ssmi Avatar answered Jan 29 '26 13:01

lec_ssmi



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!