apache-spark tutorials and guides

PCA output in Spark doesn't matches with scikit-learn

Aug 24, 2019

Using Spark Structured Streaming to Read Data From Kafka, Issue of Over-time is Always Occured

Apr 19, 2021

apache-spark apache-kafka spark-structured-streaming

Caching dataframes while keeping partitions

Nov 08, 2022

apache-spark

Can't pickle _thread.lock objects Pyspark send request to elasticseach

Jun 28, 2022

python apache-spark elasticsearch pyspark

AnalysisException: Queries with streaming sources must be executed with writeStream.start()

Jan 19, 2020

apache-spark spark-structured-streaming

Watermarking for Spark structured streaming with three way joins

May 30, 2022

scala apache-spark spark-structured-streaming

connecting mysql with pyspark

Apr 21, 2022

python mysql apache-spark pyspark

Spark Dataset when to use Except vs Left Anti Join

Nov 09, 2022

apache-spark apache-spark-sql anti-join

Reading a custom pyspark transformer

Aug 31, 2022

apache-spark pyspark pipeline apache-spark-ml

Strange behavior when using toDF() function to transfrom RDD to Dataframe in PySpark

Aug 17, 2022

python apache-spark pyspark apache-spark-sql rdd

How to use new Hadoop parquet magic commiter to custom S3 server with Spark

Sep 05, 2022

apache-spark hadoop amazon-s3

Graphx : Is it possible to execute a program on each vertex without receiving a message?

Mar 18, 2022

scala apache-spark graph-theory spark-graphx spark-shell

spark structured streaming exception : Append output mode not supported without watermark

Aug 23, 2022

apache-spark spark-structured-streaming

PySpark timeout trying to repartition/write to parquet (Futures timed out after [300 seconds])?

Oct 29, 2022

apache-spark pyspark apache-spark-sql aws-glue

pyspark - getting Latest partition from Hive partitioned column logic

Sep 24, 2022

apache-spark hive pyspark hive-partitions

Get name / alias of column in PySpark

May 22, 2022

apache-spark pyspark alias columnname

IllegalStateException: _spark_metadata/0 doesn't exist while compacting batch 9

May 13, 2022

scala apache-spark apache-kafka spark-structured-streaming

Apache Spark 2.2: broadcast join not working when you already cache the dataframe which you want to broadcast

Aug 26, 2022

apache-spark apache-spark-sql apache-spark-dataset apache-spark-2.0

Does flatmap give better performance than filter+map?

Sep 24, 2022

scala apache-spark

How to execute Spark code locally with databricks-connect?

Oct 29, 2022

azure apache-spark databricks azure-databricks

New posts in apache-spark