apache-spark tutorials and guides

unbound method createDataFrame()

May 05, 2026

apache-spark pyspark

Spark: rename multiple columns with alias

May 06, 2026

scala apache-spark apache-spark-sql

Subset one array column with another (boolean) array column

May 06, 2026

apache-spark pyspark apache-spark-sql

Is spark persist() (then action) really persisting?

May 06, 2026

scala apache-spark pyspark apache-spark-sql persistence

Is "getNumPartitions" an expensive operation?

May 05, 2026

python python-2.7 apache-spark pyspark apache-spark-sql

Serialization issues in Spark Streaming

May 05, 2026

apache-spark apache-spark-sql spark-streaming apache-spark-ml

Output Spark application id in the logs with Log4j

May 04, 2026

json scala apache-spark log4j

Spark Worker asking for absurd amounts of virtual memory

May 05, 2026

memory memory-management apache-spark pyspark virtual-memory

Parallelism in Cassandra read using Scala

May 06, 2026

scala apache-spark concurrency cassandra

Using Spark ML Pipelines just for Transformations

May 05, 2026

apache-spark apache-spark-mllib apache-spark-ml

How to use foreachPartition in Spark 2.2 to avoid Task Serialization error

May 06, 2026

scala apache-spark apache-kafka apache-spark-sql spark-streaming

Job are not shown on Spark WebUI

May 04, 2026

apache-spark pyspark webui

Scala module 2.12.3 requires Jackson Databind version >= 2.12.0 and < 2.13.0 but I have databind 2.12.3

May 05, 2026

java apache-spark data-binding version

Is it possible to read ORC file to Spark Data Frame in sparklyr?

May 05, 2026

r apache-spark sparkr sparklyr orc

Spark window function without orderBy

May 05, 2026

apache-spark apache-spark-sql

Spark convert array of structs to Vector for Euclidean distance

May 05, 2026

apache-spark apache-spark-sql user-defined-functions apache-spark-mllib

Spark structured streaming maxOffsetsPerTrigger does not seem to work

May 03, 2026

apache-spark spark-structured-streaming

How to print/log outputs within foreachBatch function?

May 05, 2026

apache-spark databricks spark-structured-streaming

New posts in apache-spark