I am really new to PySpark. All I want to do is to find the maximum of the "date" column and add a new column in the dataframe that has this maximum date for all the rows (repeating) such that:
A B C
a timestamp1 timestamp3
b timestamp2 --------------------> timestamp3
c timestamp3 timestamp3
I use the following line of code:
df.withColumn('dummy_column',f.lit((f.max('date'))).cast('timestamp')).show(9)
But I get the error:
> AnalysisException: grouping expressions sequence is empty, and
> '`part`' is not an aggregate function. Wrap '(CAST(max(`date`) AS
> TIMESTAMP) AS `new_column`)' in windowing function(s) or wrap '`part`'
> in first() (or first_value) if you don't care which value you get.;;
Can anyone help me understand why do I get this error and how can I work around it?
You're probably looking for:
import pyspark.sql.functions as f
from pyspark.sql.window import Window
w = Window.rowsBetween(Window.unboundedPreceding, Window.unboundedFollowing)
df.withColumn('dummy_column',f.max('date').over(w).cast('timestamp')).show(9)
Aggregate functions like max
work with a window or a grouping operation. They cannot work on their own because you did not specify the range of rows over which the aggregate functions operates.
There is one more alterenative to get max_timestamp
is just grpupBy()
the existing dataframe and use max()
in order to get the max timestamp
, take that in a variable
and use as per your requirement
df = spark.createDataFrame([(1,"2020-10-13"),(2,"2020-10-14"),(3,"2020-10-15")],[ "id","ts"])
df.show()
#df_max = df.groupBy("ts").agg(F.max("ts").alias("max_ts"))
df_max_var = df_max.collect()[0]['max_ts']
# Taking into a variable for future use
df = df.withColumn("dummy_col", F.lit(df_max_var))
df.show()
+---+----------+
| id| ts|
+---+----------+
| 1|2020-10-13|
| 2|2020-10-14|
| 3|2020-10-15|
+---+----------+
+---+----------+----------+
| id| ts| dummy_col|
+---+----------+----------+
| 1|2020-10-13|2020-10-15|
| 2|2020-10-14|2020-10-15|
| 3|2020-10-15|2020-10-15|
+---+----------+----------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With