apache-spark tutorials and guides

Delta lake and ADLS Gen2 transactions

Jun 24, 2026

Adding new column using other existing columns Spark/Scala

Jun 24, 2026

scala dataframe apache-spark apache-spark-sql

more efficient way to get monthly counts in Python/Pyspark

Jun 23, 2026

python sql apache-spark pyspark apache-spark-sql

Dataframe API vs Spark.sql [duplicate]

Jun 24, 2026

dataframe apache-spark catalyst-optimizer

Spark and Scala: Apply a function to each element of an RDD

Jun 24, 2026

scala apache-spark

Spark File Logger in Yarn Mode

Jun 24, 2026

apache-spark log4j hadoop-yarn

How do I print the contents of an ApacheSpark RDD in my terminal?

Jun 24, 2026

scala matrix apache-spark

Glue - An error occurred while calling getDynamicFrame

Jun 23, 2026

amazon-web-services apache-spark pyspark apache-spark-sql aws-glue

How to ensure that loading of Spark DataFrame from Parquet is distributed and parallelized?

Jun 24, 2026

apache-spark apache-spark-sql parquet

(Spark skewed join) How to join two large Spark RDDs with highly duplicated keys without memory issues?

Jun 23, 2026

java apache-spark join rdd scalability

org.apache.hive.com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 21

Jun 24, 2026

hadoop apache-spark hive hadoop-yarn

PySpark Structured Streaming: trigger once not working with Kafka

Jun 24, 2026

apache-spark pyspark apache-kafka spark-streaming

Apache Spark 2.0 (PySpark) - DataFrame Error Multiple sources found for csv

Jun 24, 2026

apache-spark pyspark apache-spark-sql

How to select a column in a dataframe by its number instead of its name

Jun 24, 2026

scala dataframe apache-spark

Do we need to checkpoint both readStream and writeStream of Kafka in Spark Structured Streaming?

Jun 22, 2026

apache-spark spark-streaming

collect sparkr into dataframe

Jun 23, 2026

r apache-spark sparkr

Spark: Is a col of a datetime on a weekday or weekend?

Jun 23, 2026

python apache-spark pyspark

pyspark get element from array Column of struct based on condition

Jun 23, 2026

python dataframe apache-spark pyspark apache-spark-sql

Data preprocessing with apache spark and scala

Jun 23, 2026

scala apache-spark rdd

New posts in apache-spark