apache-spark tutorials and guides

Spark - Adding JDBC Driver JAR to Google Dataproc

Nov 17, 2022

Do parquet files preserve the row order of Spark DataFrames?

Nov 01, 2022

apache-spark apache-spark-sql parquet

Not enough space to cache rdd in memory warning

Oct 07, 2019

amazon-web-services amazon-s3 apache-spark rdd

How does the number of partitions affect `wholeTextFiles` and `textFiles`?

Jan 09, 2020

python apache-spark pyspark

Regrouping / Concatenating DataFrame rows in Spark

Nov 18, 2022

scala apache-spark dataframe apache-spark-sql apache-spark-ml

A quick guide on Salt-based install of Spark cluster

Feb 08, 2022

apache-spark hdfs salt-stack

What are the pros and cons of using broadcast variables in a singleton?

Nov 02, 2022

java apache-spark broadcast

Spark: why tasks assigned only to one worker?

Jul 22, 2022

apache-spark

Spark-HBASE Error java.lang.IllegalStateException: unread block data

Dec 21, 2021

apache-spark hbase apache-spark-sql

How to add a typesafe config file which is located on HDFS to spark-submit (cluster-mode)?

Jul 06, 2021

hadoop apache-spark hdfs typesafe

Is it possible to run spark yarn cluster from the code?

Feb 21, 2019

java apache-spark hadoop-yarn

Persisting data to DynamoDB using Apache Spark

Nov 12, 2022

apache-spark amazon-dynamodb apache-spark-sql amazon-emr spark-dataframe

Merge multiple RDD generated in loop

Sep 08, 2022

scala apache-spark rdd

Spark not leveraging hdfs partitioning with parquet

Aug 28, 2022

hadoop apache-spark hdfs parquet bigdata

Efficiency of flatMap vs map followed by reduce in Spark

Oct 15, 2022

scala apache-spark mapreduce rdd flatmap

How access individual element in a tuple on a RDD in pyspark?

Apr 05, 2022

python apache-spark pyspark rdd

Can a model be created on Spark batch and use it in Spark streaming?

Nov 12, 2022

apache-spark machine-learning spark-streaming

How to save RandomForestClassifier Spark model in scala?

Jun 24, 2019

scala apache-spark apache-spark-mllib

How can I declare a Column as a categorical feature in a DataFrame for use in ml

Dec 05, 2021

python apache-spark pyspark apache-spark-ml

Passing Python functions as objects to Spark

Mar 08, 2019

python apache-spark pyspark

New posts in apache-spark