Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in apache-spark-sql

Why join in spark in local mode is so slow?

Aggregate sparse vector in PySpark

JSON Struct to Map[String,String] using sqlContext

pyspark corr for each group in DF (more than 5K columns)

Is there a data architecture for efficient joins in Spark (a la RedShift)?

How to use correlation in Spark with Dataframes?

How to fix 'DataFrame' object has no attribute 'coalesce'?

Spark Streaming Exception: java.util.NoSuchElementException: None.get

SparkSQL - accesing nested structures Row( field1, field2=Row(..))

nested apache-spark-sql

Spark-submit Sql Context Create Statement does not work

pyspark: "too many values" error after repartitioning

Defining DateType conversion for DataFrame schema in Spark

scala apache-spark-sql

Why would one use DataFrame.select over DataFrame.rdd.map (or vice versa)?

FIRST() or LAST() Aggregate Function in HIVE

Spark-SQL Joining two dataframes/ datasets with same column name

How to compose column name using another column's value for withColumn in Scala Spark

Adding a column of rowsums across a list of columns in Spark Dataframe

PySpark: Take average of a column after using filter function

Can we load Parquet file into Hive directly?

How to avoid shuffles while joining DataFrames on unique keys?