Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split a string column into many columns by delimiter in Polars

In pandas, the following code will split the string from col1 into many columns. is there a way to do this in polars?

data = {"col1": ["a/b/c/d", "a/b/c/d"]}
df = pl.DataFrame(data)

df_pd = df.to_pandas()
df_pd[["a", "b", "c", "d"]] = df_pd["col1"].str.split("/", expand=True)

pl.from_pandas(df_pd)
shape: (2, 5)
┌─────────┬─────┬─────┬─────┬─────┐
│ col1    ┆ a   ┆ b   ┆ c   ┆ d   │
│ ---     ┆ --- ┆ --- ┆ --- ┆ --- │
│ str     ┆ str ┆ str ┆ str ┆ str │
╞═════════╪═════╪═════╪═════╪═════╡
│ a/b/c/d ┆ a   ┆ b   ┆ c   ┆ d   │
│ a/b/c/d ┆ a   ┆ b   ┆ c   ┆ d   │
└─────────┴─────┴─────┴─────┴─────┘

1 Answers

You can convert to a struct datatype.

  • .list.to_struct()
import polars as pl

df = pl.DataFrame({
       "my_str": ["cat", "cat/dog", None, "", "cat/dog/aardvark/mouse/frog"],
})

df.select(pl.col("my_str").str.split("/")
    .list.to_struct(n_field_strategy="max_width")).unnest("my_str")

Notice you must use n_field_strategy="max_width", otherwise, unnest() will create only 1 column.

Update: for polars >= v1.33 n_field_strategy is deprecated and you must either set fields as a sequence or upper_bound instead.

like image 182
waltersantosf Avatar answered Dec 23 '25 12:12

waltersantosf



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!