apache-spark tutorials and guides

Why does Spark report spark.SparkException: File ./someJar.jar exists and does not match contents of

Apr 12, 2022

apache-spark

How to perform initialization in spark?

Nov 15, 2022

scala apache-spark

apache spark streaming - kafka - reading older messages

Sep 30, 2022

apache-spark apache-zookeeper apache-kafka spark-streaming

Can't run Spark 1.2 in standalone mode on Mac

Oct 05, 2022

apache-spark

saving a dataframe to JSON file on local drive in pyspark

Aug 30, 2022

python json apache-spark pyspark

Is there a way to change the replication factor of RDDs in Spark?

Jun 30, 2022

java scala hadoop apache-spark hadoop-yarn

How to compare multiple rows?

Nov 06, 2019

scala apache-spark spark-streaming apache-spark-sql

Sending Large CSV to Kafka using python Spark

Mar 19, 2022

python apache-spark apache-kafka pyspark kafka-python

Using groupBy in Spark and getting back to a DataFrame

Nov 02, 2022

scala apache-spark apache-spark-sql

Add Yarn cluster configuration to Spark application

Jun 09, 2019

scala hadoop apache-spark hadoop-yarn

How to pass additional parameters to user-defined methods in pyspark for filter method?

Dec 31, 2021

python apache-spark pyspark

How to read parquet files using `ssc.fileStream()`? What are the types passed to `ssc.fileStream()`?

May 18, 2021

scala hadoop apache-spark spark-streaming hadoop2

Replace new line (\n) character in csv file - spark scala

Oct 21, 2022

scala replace apache-spark character newline

Why are "sc.addFile" and "spark-submit --files" not distributing a local file to all workers?

Aug 21, 2021

file apache-spark cluster-computing distribute

How can I read in a binary file from hdfs into a Spark dataframe?

Sep 07, 2022

python hadoop numpy apache-spark spark-dataframe

How to get date and time from string?

Dec 06, 2018

scala date apache-spark apache-spark-sql

Conflict between httpclient version and Apache Spark

Jan 05, 2021

java apache-spark amazon-ec2 apache-httpclient-4.x

pyspark expected zero arguments for construction of ClassDict (for pyspark.mllib.linalg.DenseVector)

Dec 09, 2021

apache-spark pyspark apache-spark-sql user-defined-functions apache-spark-mllib

Install Spark on an existing Hadoop cluster

Sep 27, 2022

linux hadoop apache-spark

How to register S3 Parquet files in a Hive Metastore using Spark on EMR

Nov 15, 2022

apache-spark hive elastic-map-reduce apache-spark-1.6

New posts in apache-spark