apache-spark-sql tutorials

Passing nullable columns as parameter to Spark SQL UDF

Feb 17, 2022

apache-spark apache-spark-sql

How to hint for sort merge join or shuffled hash join (and skip broadcast hash join)?

Jan 30, 2022

scala apache-spark apache-spark-sql

Understanding Spark Structured Streaming Parallelism

Aug 15, 2022

apache-spark apache-spark-sql spark-structured-streaming

Pyspark: how are dataframe describe() and summary() implemented

Jan 29, 2021

python oop dataframe pyspark apache-spark-sql

How to write null value from Spark sql expression of DataFrame to a database table? (IllegalArgumentException: Can't get JDBC type for null)

Dec 01, 2021

apache-spark apache-spark-sql

AWS connection timeout when running Spark job on EMR

Oct 31, 2022

hadoop apache-spark amazon-s3 apache-spark-sql emr

PySpark: spit out single file when writing instead of multiple part files

Sep 08, 2022

python amazon-s3 apache-spark pyspark apache-spark-sql

How to create a z-score in Spark SQL for each group

Aug 29, 2022

python apache-spark pyspark apache-spark-sql

convert dataframe to libsvm format

Sep 27, 2022

apache-spark pyspark apache-spark-sql spark-dataframe apache-spark-mllib

Why dataset.count() is faster than rdd.count()?

Apr 14, 2022

scala performance apache-spark apache-spark-sql apache-spark-dataset

merge multiple small files in to few larger files in Spark

Aug 29, 2022

scala hadoop apache-spark hive apache-spark-sql

Elegant Json flatten in Spark [duplicate]

Jul 01, 2020

json scala apache-spark apache-spark-sql

Custom aggregation on PySpark dataframes [duplicate]

Jun 03, 2020

apache-spark pyspark apache-spark-sql aggregate-functions user-defined-functions

org.apache.spark.sql.AnalysisException: cannot resolve given input columns

Jan 03, 2022

scala apache-spark dataframe apache-spark-sql spark-jobserver

Spark Mongodb Connector Scala - Missing database name

Aug 09, 2019

mongodb scala apache-spark apache-spark-sql

Check if table exists in hive metastore using Pyspark

Nov 18, 2022

python-3.x apache-spark hive pyspark apache-spark-sql

Select array element from Spark Dataframes split method in same call?

Feb 03, 2022

python apache-spark pyspark apache-spark-sql

Convert List into dataframe spark scala

Nov 16, 2022

scala apache-spark apache-spark-sql spark-dataframe

How to read simple text file from Google Cloud Storage using Spark-Scala local Program

Oct 23, 2022

scala google-app-engine apache-spark-sql google-cloud-storage google-cloud-dataproc

Get IDs for duplicate rows (considering all other columns) in Apache Spark

Nov 06, 2022

apache-spark pyspark apache-spark-sql pyspark-sql

New posts in apache-spark-sql