apache-spark tutorials and guides

How to get the TypeTag for a class in Java

Apr 16, 2022

java scala apache-spark

Databricks Exception: Total size of serialized results is bigger than spark.driver.maxResultsSize

Nov 09, 2022

python azure apache-spark databricks

Spark - Non-time-based windows are not supported on streaming DataFrames/Datasets;

Sep 14, 2022

java apache-spark apache-spark-sql spark-streaming

Spark Kryo register for array class

May 31, 2021

java apache-spark kryo

How does Round Robin partitioning in Spark work?

Oct 24, 2022

scala apache-spark partitioning

Why does Spark groupBy.agg(min/max) of BigDecimal always return 0?

Nov 11, 2022

apache-spark apache-spark-sql bigdecimal

Submitting pyspark script to a remote Spark server?

Oct 16, 2022

apache-spark pyspark amazon-emr

What's the purpose of OutputMode in flatMapGroupsWithState? How/where is it used?

Nov 06, 2022

apache-spark spark-structured-streaming

List all additional jars loaded in pyspark

Apr 21, 2022

apache-spark pyspark

pyspark 'DataFrame' object has no attribute '_get_object_id'

Nov 20, 2022

python dataframe apache-spark pyspark

Using partitions (with partitionBy) when writing a delta lake has no effect

Apr 26, 2022

apache-spark apache-spark-sql partitioning mapr delta-lake

Why joining structure-identic dataframes gives different results?

Sep 30, 2022

apache-spark join pyspark apache-spark-sql

Spark processing columns in parallel

Dec 02, 2018

scala apache-spark rdd

How to run script in Pyspark and drop into IPython shell when done?

Oct 18, 2022

python ipython apache-spark

how to run python script in spark job?

Aug 30, 2022

python apache-spark

spark scalability: what am I doing wrong?

Oct 29, 2022

apache-spark bigdata pyspark scalability distributed-computing

how to collect spark sql output to a file?

Sep 12, 2022

scala apache-spark apache-spark-sql

How to save/export a Spark ML Lib model to PMML?

Oct 17, 2022

hadoop deployment machine-learning apache-spark modeling

Concurrent job Execution in Spark

Dec 08, 2018

java multithreading apache-spark hadoop-yarn

Equivalent of Distributed Cache in Spark? [duplicate]

Oct 16, 2022

java scala hadoop apache-spark

New posts in apache-spark