apache-spark tutorials and guides

Passing multiple system properties to google dataproc cluster job

Aug 22, 2022

What is the difference between a "stateful" and "stateless" system?

Oct 15, 2022

apache-spark streaming spark-streaming state apache-flink

Spark Structured Streaming app has no jobs and no stages

Oct 30, 2022

apache-spark apache-kafka spark-structured-streaming

Spark Structured Streaming Blue/Green Deployments

Nov 13, 2022

apache-spark hadoop deployment spark-structured-streaming blue-green-deployment

Error handling with Try match inside an udf - and log row where it failed

Nov 06, 2022

scala apache-spark dataframe error-handling user-defined-functions

Spark pivot groupby performance very slow

Dec 10, 2021

apache-spark dataframe group-by pivot

Recommended way to access HBase using Scala

Oct 17, 2022

scala apache-spark hbase apache-flink scalding

Pyspark sql: Create a new column based on whether a value exists in a different DataFrame's column

Sep 05, 2022

python apache-spark pyspark pyspark-sql

How can I train a random forest with a sparse matrix in Spark?

Jun 06, 2022

r apache-spark apache-spark-mllib apache-spark-ml sparklyr

Issue upon Spark Upgrade : key not found: _PYSPARK_DRIVER_CONN_INFO_PATH

Sep 17, 2022

apache-spark pyspark

Issue while parsing mongo collection which has few schemas in spark

Sep 20, 2022

mongodb apache-spark apache-spark-sql

Spark Java - Collect multiple columns into array column

Aug 27, 2022

java apache-spark apache-spark-dataset

Diffrence between extends from App and object contain main method in scala

Aug 21, 2022

scala apache-spark

Named accumulator in pyspark

Dec 26, 2021

python apache-spark pyspark

spark.sql vs SqlContext

Sep 05, 2022

apache-spark pyspark apache-spark-sql pyspark-sql

log from spark udf to driver

Sep 13, 2022

scala apache-spark databricks

Apache Spark UI displays incorrect input size of file being ingested

Feb 12, 2022

apache-spark apache-spark-sql

Apache Spark 2.3.1 with Hive metastore 3.1.0

Apr 02, 2022

apache-spark hive apache-spark-sql hive-metastore hdp

How to pass variables in spark SQL, using python?

Aug 23, 2022

python apache-spark pyspark apache-spark-sql

Difference when serializing a lazy val with or without @transient

Sep 06, 2022

scala serialization apache-spark lazy-initialization transient

New posts in apache-spark