Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to count unique state combinations per ID in a Polars DataFrame

I have a Polars DataFrame where each id can appear multiple times with different state values (either 1 or 2). I want to count how many unique ids have only state 1, only state 2, or both states 1 and 2.

import polars as pl

df = pl.DataFrame({
    "id": [1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 9, 9, 10, 10, 10, 11, 11, 12, 12, 13, 14, 15, 15, 16, 16, 17, 17, 18, 18, 19, 20, 20, 20],
    "state": [1, 2, 1, 1, 2, 2, 1, 2, 1, 1, 2, 2, 1, 1, 2, 1, 2, 1, 2, 2, 2, 2, 1, 1, 2, 2, 1, 2, 1, 2, 1, 1, 2, 2, 1, 1, 2, 2]
})

I want to count how many unique ids fall into each category:

• Only state 1 (e.g., IDs that only have 1)

• Only state 2 (e.g., IDs that only have 2)

• Both states 1 and 2 (e.g., IDs that have both 1 and 2)

Expected Result (Example):

State combination [1]  -> 20 IDs  
State combination [2]  -> 15 IDs  
State combination [1, 2]  -> 30 IDs  
like image 806
Simon Avatar asked Dec 03 '25 22:12

Simon


2 Answers

You could group by the id and use .all() and .any() to check the states.

(df.group_by("id")
   .agg(
       one  = (pl.col.state == 1).all(),
       two  = (pl.col.state == 2).all(),
       both = (pl.col.state == 1).any() & (pl.col.state == 2).any()
       # both = pl.lit(1).is_in("state") & pl.lit(2).is_in("state")
   )
   # .select(pl.exclude("id").sum())
)
shape: (20, 4)
┌─────┬───────┬───────┬───────┐
│ id  ┆ one   ┆ two   ┆ both  │
│ --- ┆ ---   ┆ ---   ┆ ---   │
│ i64 ┆ bool  ┆ bool  ┆ bool  │
╞═════╪═══════╪═══════╪═══════╡
│ 6   ┆ false ┆ true  ┆ false │
│ 3   ┆ false ┆ true  ┆ false │
│ 2   ┆ true  ┆ false ┆ false │
│ 12  ┆ true  ┆ false ┆ false │
│ 16  ┆ false ┆ false ┆ true  │
│ …   ┆ …     ┆ …     ┆ …     │
│ 9   ┆ false ┆ false ┆ true  │
│ 13  ┆ false ┆ true  ┆ false │
│ 8   ┆ false ┆ true  ┆ false │
│ 15  ┆ false ┆ false ┆ true  │
│ 10  ┆ false ┆ false ┆ true  │
└─────┴───────┴───────┴───────┘

The .sum() of the bool columns are the counts.

shape: (1, 3)
┌─────┬─────┬──────┐
│ one ┆ two ┆ both │
│ --- ┆ --- ┆ ---  │
│ u32 ┆ u32 ┆ u32  │
╞═════╪═════╪══════╡
│ 6   ┆ 7   ┆ 7    │
└─────┴─────┴──────┘
like image 113
jqurious Avatar answered Dec 05 '25 10:12

jqurious


You need two group_bys, the first to combine the ids into the states and then another group_by for the states to count up the number of ids

(
    df
    .group_by("id")
    .agg(pl.col("state").unique().sort())
    .group_by("state")
    .len()
)
like image 44
Dean MacGregor Avatar answered Dec 05 '25 10:12

Dean MacGregor



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!