How to get value_counts for every word in a string column?

Question

I have a string column and I want to make a word count on all text.

DataFrame example:

df = pl.DataFrame({
    "Description": [
        "Would never order again.",
        "I'm not sure it gives me any type of glow and",
        "Goes on smoothly a bit sticky and color is glow",
        "Preferisco altri prodotti della stessa marca.",
        "The moisturizing advertised is non-existent."
    ]
})

If I am using pandas, I would use .str.split, stack and value_counts

pl.from_pandas(
   df.to_pandas().Description.str.split(expand=True)
     .stack()
     .value_counts()
     .reset_index()
)

shape: (33, 2)
┌───────────────┬───────┐
│ index         ┆ count │
│ ---           ┆ ---   │
│ str           ┆ i64   │
╞═══════════════╪═══════╡
│ and           ┆ 2     │
│ glow          ┆ 2     │
│ is            ┆ 2     │
│ Would         ┆ 1     │
│ altri         ┆ 1     │
│ …             ┆ …     │
│ not           ┆ 1     │
│ I'm           ┆ 1     │
│ again.        ┆ 1     │
│ order         ┆ 1     │
│ non-existent. ┆ 1     │
└───────────────┴───────┘

How would I do this using just Polars?

ritchie46 · Accepted Answer

You can do something like this:

(df.select(pl.col("Description").str.split(" ").flatten().alias("words"))
    .group_by("words")
    .len()
    .sort("len", descending=True)
    .filter(pl.col("words").str.len_chars() > 0)
)

shape: (33, 2)
┌───────────────┬─────┐
│ words         ┆ len │
│ ---           ┆ --- │
│ str           ┆ u32 │
╞═══════════════╪═════╡
│ is            ┆ 2   │
│ and           ┆ 2   │
│ glow          ┆ 2   │
│ me            ┆ 1   │
│ of            ┆ 1   │
│ …             ┆ …   │
│ it            ┆ 1   │
│ The           ┆ 1   │
│ Would         ┆ 1   │
│ non-existent. ┆ 1   │
│ type          ┆ 1   │
└───────────────┴─────┘

How to get value_counts for every word in a string column?

Tags:

python

python-polars

MPA

1 Answers

ritchie46

Recent Activity

Donate For Us

How to get value_counts for every word in a string column?

Tags:

python

python-polars

MPA

1 Answers

ritchie46

Related questions

Recent Activity

Donate For Us