Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Case sensitive column drop operation for pyspark dataframe?

From some brief testing, it appears that the column drop function for pyspark dataframes is not case sensitive, eg.

from pyspark.sql import SparkSession
from pyspark.sql.functions import *
import sys

sparkSession = SparkSession.builder.appName("my-session").getOrCreate()

dff = sparkSession.createDataFrame([(10,123), (14,456), (16,678)], ["age", "AGE"])

>>> dff.show()
+---+---+
|age|AGE|
+---+---+
| 10|123|
| 14|456|
| 16|678|
+---+---+

>>> dff.drop("AGE")
DataFrame[]

>>> dff_dropped = dff.drop("AGE")
>>> dff_dropped.show()
++
||
++
||
||
||
++

"""
What I'd like to see here is:
+---+
|age|
+---+
| 10|
| 14|
| 16|
+---+
"""

Is there a way to drop dataframe columns in a case sensitive way? (Have seen some comments related to something like this in spark JIRA discussions, but was looking for something at only applied to the drop() operation in an ad hoc way (not a global / persistent setting)).

like image 679
lampShadesDrifter Avatar asked Oct 27 '25 05:10

lampShadesDrifter


1 Answers

#Add this before using drop
sqlContext.sql("set spark.sql.caseSensitive=true")

You need to set casesensitivity as true if you have two columns having same name

like image 67
Prathik Kini Avatar answered Oct 29 '25 23:10

Prathik Kini