Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting items in pandas column of dictionaries

I have a dataframe with a column which has dictionaries in it. I want to count the number of occurrences of a dictionary key in the whole column.

One way of doing it is below:

import pandas as pd
from collections import Counter

df = pd.DataFrame({"data": [{"weight": 3, "color": "blue"},
{"size": 5, "weight": 2},{"size": 3, "color": "red"}]})

c = Counter()

for index, row in df.iterrows():
  for item in list(row["data"].keys()):
    c[item] += 1

print(c)

Which gives

Counter({'weight': 2, 'color': 2, 'size': 2})

Are there faster ways of doing it?

like image 677
wwl Avatar asked Nov 29 '25 16:11

wwl


1 Answers

A much faster approach would be to flatten the column with itertools.chain and build a Counter from the result (which will only contain the dictionary keys):

from itertools import chain

Counter(chain.from_iterable(df.data.values.tolist()))
# Counter({'weight': 2, 'color': 2, 'size': 2})

Timings:

def OP(df):
    c = Counter()
    for index, row in df.iterrows():
        for item in list(row["data"].keys()):
            c[item] += 1

%timeit OP(df)
# 570 µs ± 49.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit Counter(chain.from_iterable(df.data.values.tolist()))
# 14.2 µs ± 902 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
like image 121
yatu Avatar answered Dec 02 '25 04:12

yatu