import polars as pl
import pandas as pd
A = ['a','a','a','a','a','a','a','b','b','b','b','b','b','b']
B = [1,2,3,4,5,6,7,8,9,10,11,12,13,14]
df = pl.DataFrame({'cola':A,
'colb':B})
df_pd = df.to_pandas()
index = df_pd.groupby('cola')['colb'].idxmax()
df_pd.loc[index,'top'] = 1
in pandas i can get the column of top using idxmax().
however, in polars
i use the arg_max()
index = df[pl.col('colb').arg_max().over('cola').flatten()]
seems cannot get what i want..
is there any way to get generate a column of 'top' in polars?
thx a lot!
Use max
and over
:
df.with_columns(top=(pl.col('colb') == pl.col('colb').max()).over('cola'))
This gives you
shape: (14, 3)
┌──────┬──────┬───────┐
│ cola ┆ colb ┆ top │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ bool │
╞══════╪══════╪═══════╡
│ a ┆ 1 ┆ false │
│ a ┆ 2 ┆ false │
│ a ┆ 3 ┆ false │
│ a ┆ 4 ┆ false │
│ a ┆ 5 ┆ false │
│ a ┆ 6 ┆ false │
│ a ┆ 7 ┆ true │
│ b ┆ 8 ┆ false │
│ b ┆ 9 ┆ false │
│ b ┆ 10 ┆ false │
│ b ┆ 11 ┆ false │
│ b ┆ 12 ┆ false │
│ b ┆ 13 ┆ false │
│ b ┆ 14 ┆ true │
└──────┴──────┴───────┘
You can then cast to pl.Int64
if you want 1
s and 0
s
The following solution is identical to pandas in the sense that even for repeated maximum values, only a single row per group is highlighted.
For this, we compare the index of a row (temporarily created using pl.int_range
) within the group defined by cola
to the index returned by pl.Expr.arg_max
.
(
df
.with_columns(
top=(pl.int_range(pl.len()) == pl.col("colb").arg_max()).over("cola")
)
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With