apache-spark tutorials and guides

What happens when an executor is lost?

Oct 23, 2022

apache-spark

Parquet vs Cassandra using Spark and DataFrames

Oct 19, 2018

apache-spark cassandra spark-dataframe parquet

Boosting spark.yarn.executor.memoryOverhead

Jun 26, 2022

amazon-web-services apache-spark pyspark emr amazon-emr

How to filter rows for a specific aggregate with spark sql?

Nov 02, 2022

sql apache-spark aggregate apache-spark-sql spark-dataframe

How to aggregate over rolling time window with groups in Spark

Mar 19, 2019

sql apache-spark pyspark apache-spark-sql window-functions

spark sbt error: value toDF is not a member of Seq[DataRow]

Jan 03, 2021

apache-spark apache-spark-sql

What is Lineage In Spark?

Feb 20, 2022

apache-spark hadoop data-lineage

How to refresh a table and do it concurrently?

Sep 13, 2022

apache-spark apache-spark-sql spark-streaming

How to get the output from console streaming sink in Zeppelin?

Aug 29, 2022

apache-spark pyspark apache-zeppelin spark-structured-streaming

py4j.protocol.Py4JJavaError occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe

Jan 30, 2022

python-3.x apache-spark pyspark pycharm py4j

How to drop a column from a Databricks Delta table?

Sep 15, 2022

sql apache-spark apache-spark-sql databricks delta-lake

Spark: optimise writing a DataFrame to SQL Server

Nov 06, 2022

sql sql-server database scala apache-spark

What is Memory reserved on Yarn

Aug 29, 2022

hadoop apache-spark hadoop-yarn hadoop2

Pyspark py4j PickleException: "expected zero arguments for construction of ClassDict"

Jan 08, 2020

python apache-spark pyspark py4j

How to sort by value efficiently in PySpark?

Oct 25, 2022

python sorting lambda apache-spark

Create pyspark kernel for Jupyter

Oct 08, 2021

apache-spark ipython pyspark jupyter

Do you benefit from the Kryo serializer when you use Pyspark?

Jul 26, 2018

apache-spark pyspark kryo

Spark Dataframe change column value

Mar 03, 2018

scala apache-spark dataframe

How to read gz compressed file by pyspark

Mar 03, 2022

python apache-spark pyspark

How to create a custom streaming data source?

Sep 17, 2022

apache-spark spark-structured-streaming

New posts in apache-spark