apache-spark tutorials and guides

Convert UTC timestamp to local time based on time zone in PySpark

Oct 25, 2022

apache-spark pyspark apache-spark-sql

Delta Lake without Databricks Runtime

Sep 18, 2022

apache-spark hdfs databricks delta-lake

Stream-Static Join: How to refresh (unpersist/persist) static Dataframe periodically

Sep 25, 2021

scala apache-spark apache-spark-sql spark-streaming spark-structured-streaming

API compatibility between scala and python?

Jul 17, 2022

apache-spark pyspark

Spark fail when running pi.py example with yarn-client mode

May 23, 2022

apache-spark

Spark-csv data source: infer data types

Oct 25, 2022

apache-spark dataframe

Aggregation with Group By date in Spark SQL

Oct 30, 2022

sql group-by apache-spark aggregation

Convert Matrix to RowMatrix in Apache Spark using Scala

May 14, 2017

scala matrix apache-spark distributed

How to load data from saved file with Spark

Apr 06, 2022

apache-spark rdd

org.apache.spark.SparkException: Task not serializable - JavaSparkContext

Oct 09, 2021

java serialization apache-spark

Spark DataFrame created from JavaRDD<Row> copies all columns data into first column

Sep 13, 2022

apache-spark apache-spark-sql

"unbound method textFile() must be called with SparkContext instance as first argument (got str instance instead)"

Apr 03, 2022

python apache-spark pyspark

How to use spark Naive Bayes classifier for text classification with IDF?

Nov 14, 2022

python apache-spark tf-idf text-classification apache-spark-mllib

How to avoid "Not a file" exceptions when reading from HDFS with spark

Apr 29, 2022

apache-spark hdfs emr s3distcp

Understanding closures and parallelism in Spark

Apr 12, 2022

scala hadoop apache-spark

ClassNotFoundException thrown launching Spark Shell

Mar 22, 2022

apache-spark pyspark

accumulator of Spark is confusing me.

Aug 24, 2022

scala apache-spark

Spark: group concat equivalent in scala rdd

Sep 17, 2022

scala apache-spark group-concat rdd spark-dataframe

When are files "splittable"?

Jun 13, 2022

hadoop apache-spark hive hdfs file-format

using Word2VecModel.transform() does not work in map function

Jun 27, 2018

python apache-spark pyspark apache-spark-mllib word2vec

New posts in apache-spark