I'm working on Spark Streaming and trying to monitor and improve the performance for the streaming apps. But I'm confusing to the following questions.
Streaming Code Snapshot:
val transformed = input.flatMap(i => processInput(i))
val aggregated = transformed.reduceByKeyAndWindow(reduce(_, _), Seconds(aggregateWindowSizeInSeconds), Seconds(slidingIntervalInSeconds))
val finalized = aggregated.mapValues(finalize(_))
finalized
(Only the Flatmap stages occurred on the portal.)
Spark Streaming Portal

Thanks,
Tao
Spark takes the individual commands from your source and optimizes then into a plan of tasks to be executed on the cluster. One example of one such optimization is map-fusion: two calls to map come in, one single map task comes out. The stage is a higher-level boundary between groups of tasks, defined such that to cross that boundary you have to perform a shuffle.
So:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With