apache-spark tutorials and guides

error: not found: type SparkConf

Aug 19, 2020

scala apache-spark

How to submit a spark job on a remote master node in yarn client mode?

Mar 16, 2021

hadoop apache-spark cluster-computing hadoop-yarn

How to read Avro file in PySpark

Sep 18, 2022

python apache-spark avro pyspark

Spark: coalesce very slow even the output data is very small

Sep 18, 2022

scala apache-spark coalesce

Convert Dataframe to a Map(Key-Value) in Spark

Mar 04, 2019

scala dictionary apache-spark

Why does df.limit keep changing in Pyspark?

Oct 06, 2022

apache-spark pyspark spark-dataframe

argmax in Spark DataFrames: how to retrieve the row with the maximum value

Aug 22, 2022

apache-spark apache-spark-sql

How can I save an RDD into HDFS and later read it back?

Mar 15, 2022

scala apache-spark hdfs rdd bigdata

How to get all columns after groupby on Dataset<Row> in spark sql 2.1.0

Sep 18, 2022

apache-spark apache-spark-sql

How to create a copy of a dataframe in pyspark?

Mar 20, 2022

python apache-spark pyspark apache-spark-sql

Encountering " WARN ProcfsMetricsGetter: Exception when trying to compute pagesize" error when running Spark

Feb 02, 2022

python apache-spark pyspark

Is there an "Explain RDD" in spark

May 11, 2018

apache-spark rdd

How to extract application ID from the PySpark context

Oct 19, 2022

apache-spark hadoop-yarn pyspark

Case class equality in Apache Spark

Apr 08, 2022

scala apache-spark pattern-matching rdd case-class

How to connect HBase and Spark using Python?

Oct 16, 2022

python apache-spark hbase pyspark apache-spark-sql

Writing files to local system with Spark in Cluster mode

Oct 02, 2022

scala hadoop apache-spark

How to filter one spark dataframe against another dataframe

Sep 18, 2022

scala apache-spark apache-spark-sql spark-dataframe

How do I collect a single column in Spark?

Oct 17, 2019

apache-spark dataframe pyspark apache-spark-sql

How to set the number of partitions/nodes when importing data into Spark

Aug 23, 2022

sql apache-spark database-partitioning pyspark-sql

Spark Error: Not enough space to cache partition rdd_8_2 in memory! Free memory is 58905314 bytes

Jul 31, 2021

scala out-of-memory apache-spark rdd

New posts in apache-spark