apache-spark-sql tutorials

Spark case class - decimal type encoder error "Cannot up cast from decimal"

Jan 09, 2019

scala apache-spark apache-spark-sql

Read all Parquet files saved in a folder via Spark

Oct 03, 2022

scala apache-spark apache-spark-sql

Add one more StructField to schema

Dec 29, 2019

python apache-spark pyspark apache-spark-sql

get first N elements from dataframe ArrayType column in pyspark

Oct 29, 2022

apache-spark pyspark apache-spark-sql

Spark: save DataFrame partitioned by "virtual" column

Nov 20, 2022

apache-spark dataframe pyspark apache-spark-sql partitioning

How do I filter rows based on whether a column value is in a Set of Strings in a Spark DataFrame

Nov 02, 2022

scala apache-spark apache-spark-sql

How do I convert an RDD with a SparseVector Column to a DataFrame with a column as Vector

Oct 16, 2022

apache-spark pyspark apache-spark-sql apache-spark-mllib apache-spark-ml

PySpark: how to resample frequencies

Nov 11, 2022

apache-spark pyspark apache-spark-sql time-series

PySpark 1.5 How to Truncate Timestamp to Nearest Minute from seconds

Sep 03, 2022

python datetime apache-spark apache-spark-sql pyspark

EntityTooLarge error when uploading a 5G file to Amazon S3

Sep 03, 2022

amazon-s3 apache-spark jets3t parquet apache-spark-sql

Converting a Spark Dataframe to a Scala Map collection

Sep 13, 2022

apache-spark dataframe apache-spark-sql

How to change the column type from String to Date in DataFrames?

Sep 22, 2022

scala apache-spark apache-spark-sql

PySpark computing correlation

Aug 25, 2022

python apache-spark pyspark apache-spark-sql apache-spark-mllib

How to update column based on a condition (a value in a group)?

Sep 22, 2022

scala apache-spark apache-spark-sql

AuthorizationException: User not allowed to impersonate User

Mar 20, 2022

apache-spark hive apache-spark-sql beeline

How to CROSS JOIN 2 dataframe?

Sep 01, 2022

apache-spark apache-spark-sql spark-dataframe

Partition data for efficient joining for Spark dataframe/dataset

Nov 04, 2022

apache-spark apache-spark-sql spark-dataframe partitioning apache-spark-dataset

Spark Option: inferSchema vs header = true

Aug 30, 2022

csv apache-spark header apache-spark-sql schema

Spark: Merge 2 dataframes by adding row index/number on both dataframes

Sep 22, 2022

apache-spark pyspark apache-spark-sql

How to max value and keep all columns (for max records per group)? [duplicate]

Nov 02, 2022

apache-spark apache-spark-sql

New posts in apache-spark-sql