Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I custom sort a list column using another list column?

I have the following DataFrame

import polars as pl

data = {
    "user": ["xxx", "yyy", "zzz"],
    "list_1": [[1, 3, 5], [4, 9, 5], [4, 6, 1]],
    "list_2": [[8, 3, 5], [3, 4, 5], [9, 3, 6]],
    "list_3": [[6, 7, 8], [7, 8, 3], [4, 3, 2]],
    "rank": [[1, 2, 3], [1, 3, 2], [2, 3, 1]]
}

df = pl.DataFrame(data)
shape: (3, 5)
┌──────┬───────────┬───────────┬───────────┬───────────┐
│ user ┆ list_1    ┆ list_2    ┆ list_3    ┆ rank      │
│ ---  ┆ ---       ┆ ---       ┆ ---       ┆ ---       │
│ str  ┆ list[i64] ┆ list[i64] ┆ list[i64] ┆ list[i64] │
╞══════╪═══════════╪═══════════╪═══════════╪═══════════╡
│ xxx  ┆ [1, 3, 5] ┆ [8, 3, 5] ┆ [6, 7, 8] ┆ [1, 2, 3] │
│ yyy  ┆ [4, 9, 5] ┆ [3, 4, 5] ┆ [7, 8, 3] ┆ [1, 3, 2] │
│ zzz  ┆ [4, 6, 1] ┆ [9, 3, 6] ┆ [4, 3, 2] ┆ [2, 3, 1] │
└──────┴───────────┴───────────┴───────────┴───────────┘

The column rank is derived from the rank of the column list_1 and I would like to sort list_1, list_2 and list_3 using the rank column.

The desired output should be

shape: (3, 5)
┌──────┬───────────┬───────────┬───────────┬───────────┐
│ user ┆ list_1    ┆ list_2    ┆ list_3    ┆ rank      │
│ ---  ┆ ---       ┆ ---       ┆ ---       ┆ ---       │
│ str  ┆ list[i64] ┆ list[i64] ┆ list[i64] ┆ list[i64] │
╞══════╪═══════════╪═══════════╪═══════════╪═══════════╡
│ xxx  ┆ [1, 3, 5] ┆ [8, 3, 5] ┆ [6, 7, 8] ┆ [1, 2, 3] │
│ yyy  ┆ [4, 5, 9] ┆ [3, 5, 4] ┆ [7, 3, 8] ┆ [1, 3, 2] │
│ zzz  ┆ [1, 4, 6] ┆ [6, 9, 3] ┆ [2, 4, 3] ┆ [2, 3, 1] │
└──────┴───────────┴───────────┴───────────┴───────────┘

I have tried various .list functions, but I think they cannot take in arguments by row.

Is there any function in polars that can do this? Much appreciated

like image 527
benedictine_cumbersome Avatar asked Oct 30 '25 09:10

benedictine_cumbersome


1 Answers

Alternative to the solution using an explode(), group_by(), agg() pattern, one can leverage pl.Expr.list.gather to select the elements from each list in order.

For this, we transform the ranks using pl.Expr.list.eval and pl.Expr.arg_sort to obtain a list of indices selecting the elements in the order suggested by rank.

cols = ["list_1", "list_2", "list_3"]

df.with_columns(
    pl.col(cols).list.gather(
        pl.col("rank").list.eval(pl.element().arg_sort())
    )
)
shape: (3, 5)
┌──────┬───────────┬───────────┬───────────┬───────────┐
│ user ┆ list_1    ┆ list_2    ┆ list_3    ┆ rank      │
│ ---  ┆ ---       ┆ ---       ┆ ---       ┆ ---       │
│ str  ┆ list[i64] ┆ list[i64] ┆ list[i64] ┆ list[i64] │
╞══════╪═══════════╪═══════════╪═══════════╪═══════════╡
│ xxx  ┆ [1, 3, 5] ┆ [8, 3, 5] ┆ [6, 7, 8] ┆ [1, 2, 3] │
│ yyy  ┆ [4, 5, 9] ┆ [3, 5, 4] ┆ [7, 3, 8] ┆ [1, 3, 2] │
│ zzz  ┆ [1, 4, 6] ┆ [6, 9, 3] ┆ [2, 4, 3] ┆ [2, 3, 1] │
└──────┴───────────┴───────────┴───────────┴───────────┘
like image 50
Hericks Avatar answered Oct 31 '25 23:10

Hericks