apache-spark tutorials and guides

Exception with Table identified via AWS Glue Crawler and stored in Data Catalog

Sep 19, 2022

Can't start Apache Spark on Windows using Cygwin

Jan 11, 2020

apache-spark

Spark - Container is running beyond physical memory limits

Sep 19, 2022

hadoop apache-spark spark-graphx

How to balance my data across the partitions?

Sep 23, 2022

python hadoop apache-spark distributed-computing bigdata

How to update Spark MatrixFactorizationModel for ALS

Sep 19, 2022

apache-spark machine-learning apache-spark-mllib collaborative-filtering

From DataFrame to RDD[LabeledPoint]

Aug 21, 2022

scala apache-spark apache-spark-mllib

Running PySpark on and IDE like Spyder?

Sep 19, 2022

python-2.7 apache-spark

Apache Spark YARN mode startup takes too long (10+ secs)

Jun 11, 2022

hadoop apache-spark hadoop-yarn

PySpark: StructField(..., ..., False) always returns `nullable=true` instead of `nullable=false`

Jul 29, 2021

python apache-spark pyspark apache-spark-sql

Spark Streaming: foreachRDD update my mongo RDD

Dec 08, 2019

mongodb apache-spark spark-streaming

SparkStreaming, RabbitMQ and MQTT in python using pika

Apr 04, 2022

python apache-spark rabbitmq mqtt pika

Spark structured streaming - join static dataset with streaming dataset

Sep 19, 2022

scala apache-spark apache-spark-sql apache-spark-dataset spark-structured-streaming

How to find which Java/Scala thread has locked a file?

Sep 19, 2022

java scala apache-spark hive

How to load streaming data from Amazon SQS?

Oct 28, 2022

apache-spark amazon-sqs pyspark-sql spark-structured-streaming

Does Spark maintain parquet partitioning on read?

Sep 19, 2022

scala apache-spark partitioning parquet

Spark Streaming mapWithState seems to rebuild complete state periodically

Sep 19, 2022

scala apache-spark spark-streaming

Spark SQL: Why two jobs for one query?

Jul 06, 2017

apache-spark apache-spark-sql unsafe parquet

Spark Scala Split dataframe into equal number of rows

Oct 22, 2022

scala apache-spark dataframe

TypeError: Column is not iterable - How to iterate over ArrayType()?

Feb 21, 2022

apache-spark pyspark spark-dataframe pyspark-sql

Can't get a SparkContext in new AWS EMR Cluster

Sep 19, 2022

amazon-web-services apache-spark pyspark amazon-emr

New posts in apache-spark