Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NameError: name 'SparkSession' is not defined

I'm new to cask cdap and Hadoop environment.

I'm creating a pipeline and I want to use a PySpark Program. I have all the script of the spark program and it works when I test it by command like, insted it doesn't if I try to copy- paste it in a cdap pipeline.

It gives me an error in the logs:

NameError: name 'SparkSession' is not defined

My script starts in this way:

from pyspark.sql import *

spark = SparkSession.builder.getOrCreate()
from pyspark.sql.functions import trim, to_date, year, month
sc= SparkContext()

How can I fix it?

like image 418
Matteo Perico Avatar asked Oct 31 '25 08:10

Matteo Perico


1 Answers

You forgot to add

import pyspark
from pyspark.sql import SparkSession
# ---Your code----
like image 191
dol Avatar answered Nov 02 '25 13:11

dol