apache-spark-sql tutorials

Spark query running very slow

Feb 12, 2022

apache-spark apache-spark-sql pyspark

How to get the progress bar (with stages and tasks) with yarn-cluster master?

Aug 11, 2020

apache-spark jar progress-bar apache-spark-sql hadoop-yarn

How to join big dataframes in Spark SQL? (best practices, stability, performance)

Nov 13, 2022

performance join apache-spark apache-spark-sql spark-dataframe

Merging multiple rows in a spark dataframe into a single row

Jul 27, 2018

apache-spark dataframe apache-spark-sql rdd

Is there a difference between OUTER & FULL_OUTER in Spark SQL?

Apr 12, 2021

apache-spark apache-spark-sql spark-dataframe

Calculate Cosine Similarity Spark Dataframe

Nov 20, 2022

scala apache-spark apache-spark-sql apache-spark-mllib

how to implement spark sql pagination query

Nov 05, 2022

apache-spark apache-spark-sql

Hive UDF for selecting all except some columns

Sep 07, 2022

apache-spark hive hiveql apache-spark-sql udf

pyspark: TypeError: IntegerType can not accept object in type <type 'unicode'>

May 13, 2021

python apache-spark apache-spark-sql pyspark

How does Spark parallelize the processing of a 1TB file?

Nov 18, 2022

apache-spark dataframe parallel-processing apache-spark-sql

How to retrieve Metrics like Output Size and Records Written from Spark UI?

Oct 16, 2022

apache-spark apache-spark-sql spark-dataframe spark-cassandra-connector codahale-metrics

How does computing table stats in hive or impala speed up queries in Spark SQL?

Nov 19, 2022

apache-spark hive apache-spark-sql impala

Spark: Order of column arguments in repartition vs partitionBy

Jun 05, 2022

apache-spark dataframe apache-spark-sql partitioning

Saving to parquet subpartition

Feb 23, 2022

apache-spark apache-spark-sql

Iterating over PySpark GroupedData

Aug 25, 2022

python pyspark apache-spark-sql

Retain keys with null values while writing JSON in spark

Oct 15, 2022

java json apache-spark apache-spark-sql

Append a new column to an existing parquet file

Oct 17, 2022

apache-spark apache-spark-sql parquet

Why do columns change to nullable in Apache Spark SQL?

Oct 22, 2022

apache-spark apache-spark-sql apache-spark-dataset

Extract words from a string column in spark dataframe

Feb 21, 2022

regex scala apache-spark apache-spark-sql

spark.ml StringIndexer throws 'Unseen label' on fit()

Oct 21, 2022

apache-spark dataframe pyspark apache-spark-sql apache-spark-ml

New posts in apache-spark-sql