apache-spark tutorials and guides

Efficient text preprocessing using PySpark (clean, tokenize, stopwords, stemming, filter)

Apr 18, 2020

Election of new zookeeper leader shuts down the Spark Master

Oct 28, 2022

apache-spark apache-zookeeper

NullPointerException thrown in where it can't be thrown

Aug 31, 2022

java lambda nullpointerexception apache-spark

Is Spark SQL UDAF (user defined aggregate function) available in the Python API?

Feb 13, 2017

apache-spark apache-spark-sql spark-dataframe

Why does PySpark fail with random "Socket is closed" error?

May 13, 2019

apache-spark pyspark

Caching ordered Spark DataFrame creates unwanted job

Nov 17, 2022

python apache-spark pyspark apache-spark-sql pyspark-sql

Spark streaming + Kafka vs Just Kafka

Feb 06, 2019

apache-spark apache-kafka spark-streaming spark-streaming-kafka

Spark for kubernetes - Azure Blob Storage credentials issue

Mar 21, 2022

azure apache-spark kubernetes azure-blob-storage

Websphere MQ as a data source for Apache Spark Streaming

Oct 20, 2022

apache-spark ibm-mq spark-streaming

How to integrate Apache Spark with Spring MVC web application for interactive user sessions

Jul 29, 2018

java spring-mvc apache-spark machine-learning apache-spark-mllib

ClassNotFoundException: org.apache.spark.SparkConf with spark on hive

Apr 03, 2022

hadoop apache-spark hive

pyLDAvis visualization of pyspark generated LDA model

Oct 14, 2022

python apache-spark pyspark lda

Apache Spark: User Memory vs Spark Memory

Oct 23, 2022

caching apache-spark memory memory-management rdd

KryoException: Buffer overflow with very small input

May 31, 2021

apache-spark

Submitting jobs to Spark EC2 cluster remotely

Nov 17, 2022

amazon-ec2 apache-spark

Do Parquet Metadata Files Need to be Rolled-back?

Oct 26, 2022

apache-spark spark-streaming parquet

Spark EC2 SSH connection error SSH return code 255

Oct 24, 2022

ssh amazon-ec2 apache-spark

Spark program gives odd results when ran on standalone cluster

Oct 23, 2022

python apache-spark pyspark bigdata

How many partitions does Spark create when a file is loaded from S3 bucket?

Oct 01, 2022

apache-spark hadoop amazon-s3 rdd

Structured streaming won't write DF to file sink citing /_spark_metadata/9.compact doesn't exist

Sep 27, 2022

apache-spark amazon-s3 amazon-emr spark-structured-streaming

New posts in apache-spark