apache-spark-sql tutorials

Why does Spark groupBy.agg(min/max) of BigDecimal always return 0?

Nov 11, 2022

apache-spark apache-spark-sql bigdecimal

How do explicit table partitions in Databricks affect write performance?

Jun 26, 2022

amazon-s3 hive apache-spark-sql databricks delta-lake

Using partitions (with partitionBy) when writing a delta lake has no effect

Apr 26, 2022

apache-spark apache-spark-sql partitioning mapr delta-lake

Why joining structure-identic dataframes gives different results?

Sep 30, 2022

apache-spark join pyspark apache-spark-sql

how to collect spark sql output to a file?

Sep 12, 2022

scala apache-spark apache-spark-sql

Ever increasing physical memory for a Spark application in YARN

Mar 12, 2022

java hadoop memory apache-spark apache-spark-sql

How to persist sorted parquet tables for future sort merge joins?

Mar 30, 2022

apache-spark apache-spark-sql parquet

Error creating transactional connection factory during running Spark on Hive project in IDEA

Jul 26, 2021

apache-spark hive apache-spark-sql metastore

SPARK DataFrame: Remove MAX value in a group

Mar 12, 2022

apache-spark dataframe apache-spark-sql

Spark Dataset when to use Except vs Left Anti Join

Nov 09, 2022

apache-spark apache-spark-sql anti-join

Strange behavior when using toDF() function to transfrom RDD to Dataframe in PySpark

Aug 17, 2022

python apache-spark pyspark apache-spark-sql rdd

PySpark timeout trying to repartition/write to parquet (Futures timed out after [300 seconds])?

Oct 29, 2022

apache-spark pyspark apache-spark-sql aws-glue

Apache Spark 2.2: broadcast join not working when you already cache the dataframe which you want to broadcast

Aug 26, 2022

apache-spark apache-spark-sql apache-spark-dataset apache-spark-2.0

Joining two DataFrames from the same source

Nov 19, 2021

python apache-spark apache-spark-sql pyspark

How do you add a numpy.array as a new column to a pyspark.SQL DataFrame?

May 13, 2022

python apache-spark apache-spark-sql pyspark pyspark-sql

Spark job restarted after showing all jobs completed and then fails (TimeoutException: Futures timed out after [300 seconds])

Jan 01, 2018

scala apache-spark apache-spark-sql spark-dataframe

How to select a subset of fields from an array column in Spark?

Oct 18, 2022

scala apache-spark dataframe apache-spark-sql

Spark UDAF: java.lang.InternalError: Malformed class name

Jun 13, 2022

apache-spark apache-spark-sql spark-dataframe

Need a TRUE and FALSE column in Spark-SQL

Feb 28, 2019

apache-spark-sql

How to map rows to protobuf-generated class?

Jun 12, 2022

apache-spark apache-spark-sql protocol-buffers apache-spark-encoders

New posts in apache-spark-sql