Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to get percentage counts in Polars

I frequently need to calculate the percentage counts of a variable. For example for the dataframe below

df = pl.DataFrame({"person": ["a", "a", "b"], 
                   "value": [1, 2, 3]})

I want to return a dataframe like this:

shape: (2, 2)
┌────────┬──────────┐
│ person ┆ percent  │
│ ---    ┆ ---      │
│ str    ┆ f64      │
╞════════╪══════════╡
│ a      ┆ 0.666667 │
│ b      ┆ 0.333333 │
└────────┴──────────┘

What I have been doing is the following, but I can't help but think there must be a more efficient / polars way to do this

n_rows = len(df)

(   
    df
    .with_columns(pl.lit(1)
    .alias('percent'))
    .group_by('person')
    .agg(pl.sum('percent') / n_rows)
)
like image 275
mark0512 Avatar asked Aug 30 '25 16:08

mark0512


1 Answers

GroupBy.len() will help here. (which is shorthand for .agg(pl.len()))

(
    df
    .group_by("person")
    .len()
    .with_columns((pl.col("len") / pl.sum("len")).alias("percent"))
)
shape: (2, 3)
┌────────┬─────┬──────────┐
│ person ┆ len ┆ percent  │
│ ---    ┆ --- ┆ ---      │
│ str    ┆ u32 ┆ f64      │
╞════════╪═════╪══════════╡
│ a      ┆ 2   ┆ 0.666667 │
│ b      ┆ 1   ┆ 0.333333 │
└────────┴─────┴──────────┘

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!