apache-spark tutorials and guides

How to get the progress bar (with stages and tasks) with yarn-cluster master?

Aug 11, 2020

Spark DAG differs with 'withColumn' vs 'select'

Feb 05, 2022

python dataframe apache-spark pyspark directed-acyclic-graphs

How to decide on the number of partitions required for input data size and cluster resources?

Feb 09, 2019

hadoop apache-spark

Spark Streaming textFileStream not supporting wildcards

Sep 15, 2018

apache-spark hdfs spark-streaming

When to prefer Hadoop MapReduce over Spark?

Jan 31, 2020

java apache-spark hadoop mapreduce

How to join big dataframes in Spark SQL? (best practices, stability, performance)

Nov 13, 2022

performance join apache-spark apache-spark-sql spark-dataframe

How to fetch offset id while consuming Kafka from Spark, save it in Cassandra and use it to restart Kafka?

Oct 20, 2022

java apache-spark cassandra apache-kafka

How to run Spark Scala code on Amazon EMR

Aug 05, 2021

scala amazon-web-services apache-spark emr amazon-emr

Apache Spark Structured Streaming vs Apache Flink: what is the difference?

Nov 02, 2022

apache-spark apache-flink spark-structured-streaming

Spark UI History server on Kubernetes?

Aug 26, 2022

apache-spark kubernetes

Spark structured streaming app reading from multiple Kafka topics

Apr 10, 2022

apache-spark apache-kafka spark-structured-streaming

"TypeError: an integer is required (got type bytes)" when importing pyspark on Python 3.8 [duplicate]

Dec 29, 2021

apache-spark pyspark python-3.8

Spark Clusters: worker info doesn't show on web UI

Apr 06, 2022

apache-spark

Apache Spark: How to create a matrix from a DataFrame?

Oct 22, 2017

python matrix apache-spark pyspark apache-spark-mllib

How to connect Zeppelin to Spark 1.5 built from the sources?

Oct 19, 2022

apache-spark apache-zeppelin apache-spark-1.5

Merging multiple rows in a spark dataframe into a single row

Jul 27, 2018

apache-spark dataframe apache-spark-sql rdd

Spark: difference of semantics between reduce and reduceByKey

Nov 08, 2022

scala apache-spark rdd reduce

Is Spark's KMeans unable to handle bigdata?

Oct 23, 2022

python apache-spark k-means apache-spark-mllib bigdata

Spark dataframe to arrow

Nov 01, 2022

scala apache-spark dataframe apache-arrow

Is there a difference between OUTER & FULL_OUTER in Spark SQL?

Apr 12, 2021

apache-spark apache-spark-sql spark-dataframe

New posts in apache-spark