Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New posts in apache-spark-sql

Spark DataFrame ORC Hive table reading issue

Is there Spark equivalent for Pandas MultiIndex operation like set_index() or unstack()?

How to read a csv into pyspark without a java heap memory error

How to get the COUNT of emails for each id in Scala

how to merge two columns with a condition in pyspark?

Why does Zeppelin fail with "mismatched input ';' expecting <EOF>" in %spark.sql paragraph?

org.apache.spark.sql.AnalysisException: cannot resolve given input column

How to append collection as new column to DataFrame with many columns?

Missing data when ordering Pyspark Window

How to implement Slowly Changing Dimensions (SCD2) Type 2 in Spark using SQL Join

How to flatten long dataset to wide format (pivot) with no join?

Efficiently calculate top-k elements in spark

How To Apply Multiple Conditions on Case-Otherwise Statement Using Spark Dataframe API

how to change a column type in array struct by pyspark

How to use columns to create queries (e.g. WHERE clause)?

Convert an Rows or Columns to a dataframe

How to run VACUUM and OPTIMIZE SQL statements in Amazon Athena for Apache Iceberg v2 table

Creating a new scala class that relies on GraphFrames without serialization issues