apache-spark-sql tutorials

Why join in spark in local mode is so slow?

Oct 31, 2022

Aggregate sparse vector in PySpark

Oct 31, 2022

apache-spark pyspark apache-spark-sql apache-spark-ml

JSON Struct to Map[String,String] using sqlContext

Oct 31, 2022

apache-spark apache-spark-sql

pyspark corr for each group in DF (more than 5K columns)

Oct 31, 2022

python-3.x apache-spark dataframe pyspark apache-spark-sql

Is there a data architecture for efficient joins in Spark (a la RedShift)?

Oct 31, 2022

apache-spark apache-spark-sql spark-dataframe amazon-redshift

How to use correlation in Spark with Dataframes?

Oct 31, 2022

python apache-spark pyspark apache-spark-sql correlation

How to fix 'DataFrame' object has no attribute 'coalesce'?

Oct 31, 2022

python apache-spark dataframe pyspark apache-spark-sql

Spark Streaming Exception: java.util.NoSuchElementException: None.get

Oct 31, 2022

apache-spark hadoop apache-kafka apache-spark-sql spark-streaming

SparkSQL - accesing nested structures Row( field1, field2=Row(..))

Oct 22, 2022

nested apache-spark-sql

Spark-submit Sql Context Create Statement does not work

Oct 21, 2022

scala apache-spark spark-streaming apache-spark-sql

pyspark: "too many values" error after repartitioning

Oct 21, 2022

python apache-spark apache-spark-sql pyspark rdd

Defining DateType conversion for DataFrame schema in Spark

Oct 20, 2022

scala apache-spark-sql

Why would one use DataFrame.select over DataFrame.rdd.map (or vice versa)?

Oct 20, 2022

performance apache-spark dataframe apache-spark-sql rdd

FIRST() or LAST() Aggregate Function in HIVE

Oct 20, 2022

mysql apache-spark hive apache-spark-sql spark-dataframe

Spark-SQL Joining two dataframes/ datasets with same column name

Oct 19, 2022

java apache-spark apache-spark-sql apache-spark-dataset

How to compose column name using another column's value for withColumn in Scala Spark

Sep 22, 2022

scala apache-spark apache-spark-sql

Adding a column of rowsums across a list of columns in Spark Dataframe

Oct 03, 2022

scala apache-spark dataframe apache-spark-sql

PySpark: Take average of a column after using filter function

Sep 16, 2022

python apache-spark pyspark apache-spark-sql

Can we load Parquet file into Hive directly?

Jun 26, 2022

hadoop hive apache-spark-sql hiveql parquet

How to avoid shuffles while joining DataFrames on unique keys?

Oct 15, 2022

apache-spark apache-spark-sql

New posts in apache-spark-sql