I am trying to use Sparkmeausre to check the performance of my Pyspark code. I am using Pycharm Community edition on windows 10, with Pyspark properly configured. I did "pip install sparkmeasure" and sparkmeasure was sucessfully installed. Now when I am trying to run this snippet of code.
from pyspark import SparkConf , SparkContext
from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession
from sparkmeasure import StageMetrics
sc = SparkContext(master = "local" , appName = "sparkdemo")
spark = SparkSession(sc)
sm = StageMetrics(spark)
I am getting the error.
File "C:/Users/nj123/PycharmProjects/pythonProject/sparkdemo.py", line 9, in <module>
sm = StageMetrics(spark)
File "C:\Users\nj123\PycharmProjects\pythonProject\venv\lib\site-
packages\sparkmeasure\stagemetrics.py", line 15, in __init__
self.stagemetrics = self.sc._jvm.ch.cern.sparkmeasure.StageMetrics(self.sparksession._jsparkSession)
TypeError: 'JavaPackage' object is not callable
How to resolve this error and to configure sparkmeasure to Pycharm correctly?
Thanks to @user238607. Here are the steps I performed to resolve this issue.
1. First download Sparkmeasure jar file from Maven Central.
2. Then move this jar file to the spark jar folder. Mine location was, C:\Spark\spark-3.0.1-bin-hadoop2.7\jars
3. Now, Go to pycharm again, and rerun the same code.
Link to download the jar file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With