apache-spark tutorials and guides

using Word2VecModel.transform() does not work in map function

Jun 27, 2018

In Apache spark, what is the difference between using mapPartitions and combine use of broadcast variable and map

Nov 10, 2022

java scala apache-spark

Broadcast not happening while joining dataframes in Spark 1.6

Oct 20, 2022

scala apache-spark join apache-spark-sql query-optimization

How to drop rows with too many NULL values?

Mar 04, 2022

scala apache-spark dataframe apache-spark-sql

In Spark SQL, how do you register and use a generic UDF?

Aug 26, 2022

scala apache-spark udf

spark RDD sort by two values

Mar 10, 2022

scala sorting apache-spark rdd

Using spark dataFrame to load data from HDFS

Aug 24, 2022

apache-spark spark-dataframe

How to view the logs of a spark job after it has completed and the context is closed?

Oct 04, 2018

apache-spark ssh pyspark tunneling apache-spark-1.3

Reading Json file using Apache Spark

Oct 17, 2022

java json hadoop apache-spark apache-spark-2.0

Pyspark : Custom window function

Jun 23, 2022

apache-spark pyspark apache-spark-sql window-functions

Spark: How RDD.map/mapToPair work with Java

May 07, 2022

java apache-spark tuples rdd keyvaluepair

spark on yarn run double times when error [duplicate]

Jul 17, 2022

apache-spark hadoop-yarn

Spark Dataset equivalent for scala's "collect" taking a partial function

Feb 22, 2022

scala apache-spark apache-spark-dataset

How to add new columns to DataFrame given their names when they are missing?

Jan 02, 2022

scala apache-spark dataframe apache-spark-sql

How to convert Dataset into JavaPairRDD?

Jun 13, 2022

java apache-spark apache-spark-dataset java-pair-rdd

Why would Spark executors be removed (with "ExecutorAllocationManager: Request to remove executorIds" in the logs)?

May 01, 2019

apache-spark pyspark hadoop-yarn

How to change column metadata in pyspark?

Aug 27, 2022

dataframe apache-spark pyspark metadata apache-spark-ml

How to write rows asynchronously in Spark Streaming application to speed up batch execution?

Jun 27, 2022

performance apache-spark apache-spark-sql spark-streaming

spark-sql Table or view not found error

Jun 12, 2018

apache-spark apache-spark-sql spark-dataframe

How to join/merge a list of dataframes with common keys in PySpark?

Sep 28, 2022

python apache-spark pyspark apache-spark-sql

New posts in apache-spark