how to pass parameter to dictionary input for agg pyspark function

Question

From the pyspark docs, I Can do:

gdf = df.groupBy(df.name)
sorted(gdf.agg({"*": "first"}).collect())

In my actual use case I have maaaany variables, so I like that I can simply create a dictionary, which is why:

gdf = df.groupBy(df.name)
sorted(gdf.agg(F.first(col, ignorenulls=True)).collect())

@lemon's suggestion won't work for me.

How can I pass a parameter for first (i.e. ignorenulls=True), see here.

Emma · Accepted Answer

You can use list comprehension.

gdf.agg(*[F.first(x, ignorenulls=True).alias(x) for x in df.columns]).collect()

Donate For Us