apache-spark tutorials and guides

Pyspark - Cumulative sum with reset condition

Jun 24, 2022

How to find the max value of multiple columns?

Nov 07, 2022

scala apache-spark apache-spark-sql

How to set up Zeppelin to work with remote EMR Yarn cluster

Aug 29, 2022

apache-spark hadoop-yarn emr apache-zeppelin

Spark Convert Data Frame Column to dense Vector for StandardScaler() "Column must be of type org.apache.spark.ml.linalg.VectorUDT"

Mar 09, 2022

python apache-spark pyspark apache-spark-sql apache-spark-ml

Java Apache Spark: Long transformation chains result in quadratic time

May 15, 2019

java apache-spark

Pyspark Dataframe Join using UDF

Feb 07, 2022

python apache-spark pyspark apache-spark-sql user-defined-functions

set spark.streaming.kafka.maxRatePerPartition for createDirectStream

Sep 16, 2022

apache-spark spark-streaming

pyspark 1.6.0 write to parquet gives "path exists" error

Oct 15, 2021

apache-spark pyspark

How to run a scala program in terminal?

May 23, 2022

scala shell apache-spark terminal

spark sql count(*) query store result

Nov 14, 2022

sql apache-spark apache-spark-sql

Spark Parquet Loader: Reduce number of jobs involved in listing a dataframe's files

Oct 15, 2022

apache-spark pyspark

Spark 2.3.0 Read Text File With Header Option Not Working

Nov 08, 2022

python-2.7 apache-spark header spark-dataframe text-files

substring multiple characters from the last index of a pyspark string column using negative indexing

Sep 16, 2022

python apache-spark pyspark

weekofyear() returning seemingly incorrect results for January 1

Jul 17, 2021

apache-spark pyspark pyspark-sql week-number

Kafka - Could not find a 'KafkaClient' entry in the JAAS configuration java

Mar 02, 2022

apache-spark apache-kafka kerberos

PySpark - to_date format from column

Mar 03, 2019

apache-spark pyspark apache-spark-sql

Pyspark 2.4.0, read avro from kafka with read stream - Python

Aug 25, 2022

python apache-spark pyspark apache-kafka avro

PySpark: How to Append Dataframes in For Loop

Nov 21, 2022

apache-spark pyspark time-series user-defined-functions

How to count the trailing zeroes in an array column in a PySpark dataframe without a UDF

May 17, 2022

python apache-spark pyspark apache-spark-sql

How to make Spark session read all the files recursively?

Sep 21, 2022

regex scala apache-spark recursion

New posts in apache-spark