I have a Polars DataFrame where each id can appear multiple times with different state values (either 1 or 2). I want to count how many unique ids have only state 1, only state 2, or both states 1 and 2.
import polars as pl
df = pl.DataFrame({
"id": [1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 9, 9, 10, 10, 10, 11, 11, 12, 12, 13, 14, 15, 15, 16, 16, 17, 17, 18, 18, 19, 20, 20, 20],
"state": [1, 2, 1, 1, 2, 2, 1, 2, 1, 1, 2, 2, 1, 1, 2, 1, 2, 1, 2, 2, 2, 2, 1, 1, 2, 2, 1, 2, 1, 2, 1, 1, 2, 2, 1, 1, 2, 2]
})
I want to count how many unique ids fall into each category:
• Only state 1 (e.g., IDs that only have 1)
• Only state 2 (e.g., IDs that only have 2)
• Both states 1 and 2 (e.g., IDs that have both 1 and 2)
Expected Result (Example):
State combination [1] -> 20 IDs
State combination [2] -> 15 IDs
State combination [1, 2] -> 30 IDs
You could group by the id and use .all() and .any() to check the states.
(df.group_by("id")
.agg(
one = (pl.col.state == 1).all(),
two = (pl.col.state == 2).all(),
both = (pl.col.state == 1).any() & (pl.col.state == 2).any()
# both = pl.lit(1).is_in("state") & pl.lit(2).is_in("state")
)
# .select(pl.exclude("id").sum())
)
shape: (20, 4)
┌─────┬───────┬───────┬───────┐
│ id ┆ one ┆ two ┆ both │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ bool ┆ bool ┆ bool │
╞═════╪═══════╪═══════╪═══════╡
│ 6 ┆ false ┆ true ┆ false │
│ 3 ┆ false ┆ true ┆ false │
│ 2 ┆ true ┆ false ┆ false │
│ 12 ┆ true ┆ false ┆ false │
│ 16 ┆ false ┆ false ┆ true │
│ … ┆ … ┆ … ┆ … │
│ 9 ┆ false ┆ false ┆ true │
│ 13 ┆ false ┆ true ┆ false │
│ 8 ┆ false ┆ true ┆ false │
│ 15 ┆ false ┆ false ┆ true │
│ 10 ┆ false ┆ false ┆ true │
└─────┴───────┴───────┴───────┘
The .sum() of the bool columns are the counts.
shape: (1, 3)
┌─────┬─────┬──────┐
│ one ┆ two ┆ both │
│ --- ┆ --- ┆ --- │
│ u32 ┆ u32 ┆ u32 │
╞═════╪═════╪══════╡
│ 6 ┆ 7 ┆ 7 │
└─────┴─────┴──────┘
You need two group_bys, the first to combine the ids into the states and then another group_by for the states to count up the number of ids
(
df
.group_by("id")
.agg(pl.col("state").unique().sort())
.group_by("state")
.len()
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With