Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the meaning of the "Stages" on Spark UI for Streaming Scenarios

I'm working on Spark Streaming and trying to monitor and improve the performance for the streaming apps. But I'm confusing to the following questions.

  1. What's the meaning for each stages on Spark Portal for "Spark Streaming" apps.
  2. Not all the "Transformation" mapped to the tasks. And how to target the "Transformation" to the mapped tasks.

Streaming Code Snapshot:

val transformed = input.flatMap(i => processInput(i))
val aggregated = transformed.reduceByKeyAndWindow(reduce(_, _), Seconds(aggregateWindowSizeInSeconds), Seconds(slidingIntervalInSeconds))
val finalized = aggregated.mapValues(finalize(_))
finalized

(Only the Flatmap stages occurred on the portal.)

Spark Streaming Portal Spark Streaming, Spark Portal

Thanks,

Tao

like image 717
Tao Li Avatar asked Jan 30 '26 09:01

Tao Li


1 Answers

Spark takes the individual commands from your source and optimizes then into a plan of tasks to be executed on the cluster. One example of one such optimization is map-fusion: two calls to map come in, one single map task comes out. The stage is a higher-level boundary between groups of tasks, defined such that to cross that boundary you have to perform a shuffle.

So:

  • each of the operators you call on your RDD result in actions and transformations.
  • These result in a DAG of operators.
  • The DAG is compiled into stages.
  • Each stage is executed as a series of tasks.
like image 65
Francois G Avatar answered Feb 02 '26 07:02

Francois G