apache-spark tutorials and guides

Spark LDA consumes too much memory

Apr 02, 2021

apache-spark apache-spark-mllib lda

apache spark "Py4JError: Answer from Java side is empty"

Nov 15, 2021

apache-spark

SparkUI for pyspark - corresponding line of code for each stage?

Sep 19, 2022

apache-spark pyspark emr

How to read/write protocol buffer messages with Apache Spark?

Sep 07, 2022

apache-spark hdfs protocol-buffers sequencefile

In Apache Spark, how to convert a slow RDD/dataset into a stream?

Sep 19, 2022

scala apache-spark apache-spark-sql spark-streaming

What is happening when Spark is calling ShuffleBlockFetcherIterator?

Sep 14, 2022

apache-spark apache-spark-sql

spark parquet write gets slow as partitions grow

Sep 14, 2022

apache-spark partitioning parquet

Unable to understand error "SparkListenerBus has already stopped! Dropping event ..."

May 26, 2021

apache-spark

How are number of iterations and number of partitions releated in Apache spark Word2Vec?

Aug 19, 2021

apache-spark apache-spark-mllib word2vec

Spark: Difference between collect(), take() and show() outputs after conversion toDF

Sep 19, 2022

scala apache-spark dataframe collect take

Spark: Most efficient way to sort and partition data to be written as parquet

Nov 17, 2022

apache-spark pyspark apache-spark-sql pyspark-sql

Why increase spark.yarn.executor.memoryOverhead?

Aug 17, 2022

apache-spark hadoop-yarn

Read an unsupported mix of union types from an Avro file in Apache Spark

Apr 01, 2019

scala apache-spark apache-spark-sql spark-avro

Exception with Table identified via AWS Glue Crawler and stored in Data Catalog

Sep 19, 2022

amazon-web-services apache-spark amazon-s3 amazon-emr aws-glue

Can't start Apache Spark on Windows using Cygwin

Jan 11, 2020

apache-spark

Spark - Container is running beyond physical memory limits

Sep 19, 2022

hadoop apache-spark spark-graphx

How to balance my data across the partitions?

Sep 23, 2022

python hadoop apache-spark distributed-computing bigdata

How to update Spark MatrixFactorizationModel for ALS

Sep 19, 2022

apache-spark machine-learning apache-spark-mllib collaborative-filtering

From DataFrame to RDD[LabeledPoint]

Aug 21, 2022

scala apache-spark apache-spark-mllib

Running PySpark on and IDE like Spyder?

Sep 19, 2022

python-2.7 apache-spark

New posts in apache-spark