apache-spark tutorials and guides

How to get a spark job's metrics?

Aug 08, 2019

Is this a bug of spark stream or memory leak?

Nov 01, 2022

memory apache-spark memory-leaks apache-spark-sql

PySpark s3 Access with Multiple AWS Credential Profiles?

Feb 01, 2022

amazon-web-services amazon-s3 apache-spark pyspark

What to use to have graphical view of Spark's memory usage (with YARN)?

Nov 06, 2022

memory memory-management apache-spark monitoring

Apache Spark sort partition by user ID and write each partition to CSV

May 22, 2018

python sorting apache-spark pyspark

Why does sbt assembly fail with "Not a valid command: assembly"?

Jun 01, 2019

scala apache-spark sbt sbt-assembly

Lost executor Spark

Feb 23, 2022

apache-spark

PySpark: Numpy memory not being released in executor map-partition function (memory leak)

Oct 11, 2021

python numpy apache-spark memory-leaks pyspark

Joining Spark DataFrames on a nearest key condition

Nov 10, 2022

python performance dataframe apache-spark join

I cannot use --package option on bitnami/spark docker container

Aug 31, 2022

docker apache-spark elasticsearch

Spark MLlib - Collaborative Filtering Implicit Feed

Nov 16, 2022

apache-spark recommendation-engine

Spark: What is the time complexity of the connected components algorithm used in GraphX?

Apr 12, 2022

algorithm apache-spark time-complexity spark-graphx connected-components

How to repartition evenly in Spark?

Sep 07, 2022

apache-spark pyspark

Out of memory error when writing out spark dataframes to parquet format

Aug 22, 2022

java scala apache-spark parquet

Difference between a map and udf

Mar 30, 2019

scala apache-spark udf

Cassandra Error message: Not marking nodes down due to local pause. Why?

Nov 03, 2021

apache-spark amazon-ec2 cassandra datastax datastax-startup

Spark on localhost

Nov 07, 2022

apache-spark pyspark

Spark RDD- map vs mapPartitions

Nov 06, 2022

java scala apache-spark garbage-collection

Sending Spark streaming metrics to open tsdb

Nov 04, 2022

apache-spark spark-streaming opentsdb

When are Spark RDD blocks created and destroyed/removed?

Nov 11, 2022

apache-spark spark-streaming rdd

New posts in apache-spark