pyspark tutorials and guides

Specify options for the jvm launched by pyspark

Mar 17, 2023

apache-spark jvm-arguments pyspark

spark error "It appears that you are attempting to reference SparkContext from a broadcast "

Mar 17, 2023

broadcast pyspark

How to use pyspark mllib RegressionMetrics with real predictions

Mar 16, 2023

apache-spark pyspark apache-spark-mllib

Unable to merge spark dataframe columns with df.withColumn()

Mar 17, 2023

python apache-spark apache-spark-sql pyspark

Pyspark textFile json with indentation

Mar 16, 2023

python json apache-spark pyspark

How to find the intersection of two rdd's by keys in pyspark?

Mar 16, 2023

python apache-spark pyspark

Does spark's distinct() function shuffle only the distinct tuples from each partition

Mar 16, 2023

python apache-spark pyspark

PySpark: custom function in aggregation on grouped data

Mar 15, 2023

python sql dataframe pyspark

SPARK read.json throwing java.io.IOException: Too many bytes before newline

Mar 15, 2023

json apache-spark pyspark apache-spark-sql bigdata

PySpark Row objects: accessing row elements by variable names

Mar 14, 2023

python apache-spark pyspark

Deep copy a filtered PySpark dataframe from a Hive query

Mar 14, 2023

python apache-spark pyspark

integrating scikit-learn with pyspark

Mar 14, 2023

apache-spark scikit-learn pyspark

How do I read a text file & apply a schema with PySpark?

Sep 02, 2025

python apache-spark pyspark

Spark.read() multiple paths at once instead of one-by-one in a for loop

Sep 02, 2025

python apache-spark pyspark databricks azure-data-lake

Pyspark create new column based on other column with multiple condition with list or set

Sep 03, 2025

python apache-spark pyspark apache-spark-sql

convert array to struct pyspark

Aug 31, 2025

python apache-spark pyspark struct apache-spark-sql

Working with jdbc jar in pyspark

Sep 02, 2025

postgresql jdbc apache-spark pyspark apache-spark-sql

User does not have privileges for ALTERTABLE_ADDCOLS while using spark.sql to read the data

Sep 02, 2025

apache-spark pyspark apache-spark-sql

Where to modify spark-defaults.conf if I installed pyspark via pip install pyspark

Sep 02, 2025

apache-spark pyspark

pyspark RDD expand a row to multiple rows

Sep 02, 2025

python apache-spark pyspark rdd

New posts in pyspark