apache-spark tutorials and guides

Guava version while using spark-shell

Feb 12, 2019

apache-spark spark-cassandra-connector google-cloud-dataproc

Spark Shell - __spark_libs__.zip does not exist

Nov 11, 2022

hadoop apache-spark hadoop-yarn

Integrate key-value database with Spark

Feb 20, 2022

hadoop apache-spark rocksdb

What is spark.local.ip ,spark.driver.host,spark.driver.bindAddress and spark.driver.hostname?

Apr 11, 2022

apache-spark

What does df.repartition with no column arguments partition on?

Dec 11, 2021

python apache-spark pyspark pyspark-sql

Reading HDF5 files [closed]

Apr 07, 2022

scala apache-spark hdf5

foldLeft or foldRight equivalent in Spark?

Aug 22, 2022

scala apache-spark spark-streaming fold rdd

How to match Dataframe column names to Scala case class attributes?

Mar 13, 2022

scala apache-spark apache-spark-sql parquet

What does stage mean in the spark logs?

Mar 05, 2022

mapreduce apache-spark apache-spark-sql pyspark

Spark Job running on Yarn Cluster java.io.FileNotFoundException: File does not exits , eventhough the file exits on the master node

Aug 11, 2019

hadoop apache-spark hadoop-yarn spark-streaming

pyspark Do python processes on an executor node share broadcast variables in ram?

Oct 02, 2022

python apache-spark pyspark shared-memory

cannot resolve xyz given input columns error when creating Spark dataset

Jul 29, 2017

apache-spark

Creating indices for each group in Spark dataframe

Mar 27, 2022

apache-spark apache-spark-sql

java.lang.NoClassDefFoundError: Could not initialize class when launching spark job via spark-submit in scala code

Dec 26, 2021

java scala apache-spark apache-spark-sql spark-dataframe

multi-processing with spark(PySpark) [duplicate]

Aug 27, 2019

python apache-spark pyspark spark-dataframe python-multiprocessing

How to manually set group.id and commit kafka offsets in spark structured streaming?

Nov 07, 2022

apache-spark apache-kafka spark-structured-streaming spark-kafka-integration

Use of lit() in expr()

Nov 19, 2022

scala apache-spark apache-spark-sql databricks

How to set group.id for consumer group in kafka data source in Structured Streaming?

Nov 14, 2022

apache-spark apache-kafka spark-structured-streaming spark-kafka-integration

Can SPARK use multicore properly?

Jan 05, 2020

multithreading apache-spark multicore

Pass array as an UDF parameter in Spark SQL

Oct 31, 2017

scala apache-spark dataframe apache-spark-sql user-defined-functions

New posts in apache-spark