apache-spark tutorials and guides

org.apache.spark.SparkException: Task not serializable - JavaSparkContext

Oct 09, 2021

java serialization apache-spark

Spark DataFrame created from JavaRDD<Row> copies all columns data into first column

Sep 13, 2022

apache-spark apache-spark-sql

"unbound method textFile() must be called with SparkContext instance as first argument (got str instance instead)"

Apr 03, 2022

python apache-spark pyspark

How to use spark Naive Bayes classifier for text classification with IDF?

Nov 14, 2022

python apache-spark tf-idf text-classification apache-spark-mllib

How to avoid "Not a file" exceptions when reading from HDFS with spark

Apr 29, 2022

apache-spark hdfs emr s3distcp

Understanding closures and parallelism in Spark

Apr 12, 2022

scala hadoop apache-spark

ClassNotFoundException thrown launching Spark Shell

Mar 22, 2022

apache-spark pyspark

accumulator of Spark is confusing me.

Aug 24, 2022

scala apache-spark

Spark: group concat equivalent in scala rdd

Sep 17, 2022

scala apache-spark group-concat rdd spark-dataframe

When are files "splittable"?

Jun 13, 2022

hadoop apache-spark hive hdfs file-format

using Word2VecModel.transform() does not work in map function

Jun 27, 2018

python apache-spark pyspark apache-spark-mllib word2vec

In Apache spark, what is the difference between using mapPartitions and combine use of broadcast variable and map

Nov 10, 2022

java scala apache-spark

Broadcast not happening while joining dataframes in Spark 1.6

Oct 20, 2022

scala apache-spark join apache-spark-sql query-optimization

How to drop rows with too many NULL values?

Mar 04, 2022

scala apache-spark dataframe apache-spark-sql

In Spark SQL, how do you register and use a generic UDF?

Aug 26, 2022

scala apache-spark udf

spark RDD sort by two values

Mar 10, 2022

scala sorting apache-spark rdd

Using spark dataFrame to load data from HDFS

Aug 24, 2022

apache-spark spark-dataframe

How to view the logs of a spark job after it has completed and the context is closed?

Oct 04, 2018

apache-spark ssh pyspark tunneling apache-spark-1.3

Reading Json file using Apache Spark

Oct 17, 2022

java json hadoop apache-spark apache-spark-2.0

Pyspark : Custom window function

Jun 23, 2022

apache-spark pyspark apache-spark-sql window-functions

New posts in apache-spark