apache-spark tutorials and guides

Merge rows in a spark scala Dataframe

Oct 19, 2022

scala apache-spark dataframe

Possible to filter Spark dataframe by ISNUMERIC function?

Oct 19, 2022

scala apache-spark apache-spark-sql

How to keep partition columns when reading in ORC files in Spark

Oct 19, 2022

apache-spark apache-spark-sql orc

How to update a Static Dataframe with Streaming Dataframe in Spark structured streaming

Oct 19, 2022

apache-spark apache-spark-sql spark-structured-streaming

java.lang.UnsupportedOperationException: Error in spark when writing

Oct 19, 2022

apache-spark apache-spark-dataset

How does Spark handle failure scenarios involving JDBC data source?

Oct 18, 2022

scala apache-spark jdbc apache-spark-sql

Spark using recursive case class

Oct 18, 2022

scala apache-spark apache-spark-sql apache-spark-dataset

How to integrate HIVE access into PySpark derived from pip and conda (not from a Spark distribution or package)

Oct 19, 2022

python apache-spark hive pyspark hive-metastore

How to understand the queueStream API in apache spark?

Aug 21, 2022

apache-spark

How can PySpark be called in debug mode?

Sep 11, 2022

python python-2.7 hadoop intellij-idea apache-spark

spark streaming checkpoint recovery is very very slow

Sep 13, 2022

apache-spark amazon-s3 spark-streaming amazon-kinesis checkpointing

How to change case of whole column to lowercase?

Oct 03, 2022

java apache-spark apache-spark-sql apache-spark-dataset

Spark Standalone Mode: How to compress spark output written to HDFS

Feb 23, 2022

scala compression hdfs apache-spark

Error to start pre-built spark-master when slf4j is not installed

Oct 30, 2022

apache-spark

pyspark addPyFile to add zip of .py files, but module still not found

May 12, 2022

apache-spark pyspark

Spark Strutured Streaming automatically converts timestamp to local time

Nov 18, 2022

java scala apache-spark apache-spark-sql spark-structured-streaming

Why does the repartition() method increase file size on disk?

Sep 22, 2022

apache-spark

Spark : Read file only if the path exists

Apr 30, 2022

scala apache-spark parquet

Spark and Not Serializable DateTimeFormatter

Nov 04, 2022

java scala serialization apache-spark

Removing duplicate columns after a DF join in Spark

Oct 15, 2022

python pyspark apache-spark apache-spark-sql

New posts in apache-spark