How to aggregate column of vectors after groupby?

Question

I have a pandas DataFrame that has users with features (calculated from TensorFlow word embeddings). I want to be able to group by user and calculate either a mean or median of the vectorized features:

embeddings

user    features
bob [-0.030460168, -0.0014596573, 0.0997446, -0.18...
bob [-0.03197706, 0.015620711, 0.05890667, -0.0402...
bob [-0.060918115, 0.07939958, 0.0333591, 0.035655...
mary    [-0.012854534, 0.07733478, 0.12939823, 0.00992...
mary    [-0.04184026, 0.03382166, 0.1427004, -0.204424...

I tried something like this:

df.groupby('user').agg(count=('user', lambda x: len(x)),
                       mean=('features', lambda x: np.mean(x)))

But it raises the following error:

Exception: Must produce aggregated value

Dani Mesejo · Accepted Answer

The problem is that x is a pd.Series of numpy.arrays, assuming you want the centroid, you could use np.vstack and find the mean accross the first axis:

Setup

import numpy as np
import pandas as pd

arrays = [np.array([-0.030460168, -0.0014596573, 0.0997446, -0.18]),
          np.array([-0.03197706, 0.015620711, 0.05890667, -0.0402]),
          np.array([-0.060918115, 0.07939958, 0.0333591, 0.035655]),
          np.array([-0.012854534, 0.07733478, 0.12939823, 0.00992]),
          np.array([-0.04184026, 0.03382166, 0.1427004, -0.204424])]

users = ['bob', 'bob', 'bob', 'mary', 'mary']

df = pd.DataFrame(data={'user': users, 'features': arrays})

Code

result = df.groupby('user').agg(count=('user', lambda x: len(x)),
                       mean=('features', lambda x: np.vstack(x).mean(axis=0).tolist()))

print(result)

Output

      count                                               mean
user                                                          
bob       3  [-0.04111844766666667, 0.031186877899999996, 0...
mary      2  [-0.027347397, 0.055578220000000005, 0.1360493...

How to aggregate column of vectors after groupby?

Tags:

python

pandas

Evan Zamir

1 Answers

Dani Mesejo

Recent Activity

Donate For Us

How to aggregate column of vectors after groupby?

Tags:

python

pandas

Evan Zamir

1 Answers

Dani Mesejo

Related questions

Recent Activity

Donate For Us