apache-spark-sql tutorials

Not able to connect to postgres using jdbc in pyspark shell

Oct 17, 2022

SparkSQL, Thrift Server and Tableau

Dec 31, 2019

apache-spark hive apache-spark-sql

Saving/Exporting the results of a Spark SQL Zeppelin query

Nov 09, 2022

apache-spark-sql apache-zeppelin

How to add empty map type column to DataFrame?

Oct 29, 2022

scala apache-spark apache-spark-sql

Spark SQL from_json documentation

Sep 12, 2022

apache-spark-sql

How to execute Column expression in spark without dataframe

Apr 19, 2022

apache-spark apache-spark-sql

Difference between df.SaveAsTable and spark.sql(Create table..)

Aug 29, 2022

scala apache-spark hive pyspark apache-spark-sql

Spark - Reading JSON from Partitioned Folders using Firehose

Nov 07, 2022

apache-spark apache-spark-sql databricks spark-structured-streaming

PySpark: do I need to re-cache a DataFrame?

Jun 22, 2019

apache-spark pyspark apache-spark-sql spark-dataframe

Passing nullable columns as parameter to Spark SQL UDF

Feb 17, 2022

apache-spark apache-spark-sql

How to hint for sort merge join or shuffled hash join (and skip broadcast hash join)?

Jan 30, 2022

scala apache-spark apache-spark-sql

Understanding Spark Structured Streaming Parallelism

Aug 15, 2022

apache-spark apache-spark-sql spark-structured-streaming

Pyspark: how are dataframe describe() and summary() implemented

Jan 29, 2021

python oop dataframe pyspark apache-spark-sql

How to write null value from Spark sql expression of DataFrame to a database table? (IllegalArgumentException: Can't get JDBC type for null)

Dec 01, 2021

apache-spark apache-spark-sql

AWS connection timeout when running Spark job on EMR

Oct 31, 2022

hadoop apache-spark amazon-s3 apache-spark-sql emr

PySpark: spit out single file when writing instead of multiple part files

Sep 08, 2022

python amazon-s3 apache-spark pyspark apache-spark-sql

How to create a z-score in Spark SQL for each group

Aug 29, 2022

python apache-spark pyspark apache-spark-sql

convert dataframe to libsvm format

Sep 27, 2022

apache-spark pyspark apache-spark-sql spark-dataframe apache-spark-mllib

Why dataset.count() is faster than rdd.count()?

Apr 14, 2022

scala performance apache-spark apache-spark-sql apache-spark-dataset

merge multiple small files in to few larger files in Spark

Aug 29, 2022

scala hadoop apache-spark hive apache-spark-sql

New posts in apache-spark-sql