Logging in spark structured streaming

Question

I am able to develop a pipeline which reads from kafka does some transformations and write the output to kafka sink as well as parque sink. I would like adding effective logging to log the intermediate results of the transformation like in a regular streaming application.

One option I see is to log the queryExecutionstreams via

df.queryExecution.analyzed.numberedTreeString

or

logger.info("Query progress"+ query.lastProgress)
logger.info("Query status"+ query.status)

But this doesn't seem to have a way to see the business specific messages on which the stream is running on.

Is there a way how I can add more logging info like the data which it's processing?

Ajith Kannan · Accepted Answer

I found some options to track the same .Basically we can name our streaming query using df.writeStream.format("parquet") .queryName("table1").

The query name table1 will be printed in the Spark Jobs Tab against the Completed Jobs list in the Spark UI from which you can track the status for each of the streaming queries

Use the ProgressReporter API in Structured Streaming to collect more stats.

Logging in spark structured streaming

Tags:

apache-spark

spark-structured-streaming

Ajith Kannan

1 Answers

Ajith Kannan

Recent Activity

Donate For Us

Logging in spark structured streaming

Tags:

apache-spark

spark-structured-streaming

Ajith Kannan

1 Answers

Ajith Kannan

Related questions

Recent Activity

Donate For Us