Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python pandas create new column with groupby with custom agg function

Tags:

python

pandas

My DataFrame:

from random import random, randint
from pandas import DataFrame

t = DataFrame({"metasearch":["A","B","A","B","A","B","A","B"],
                   "market":["A","B","A","B","A","B","A","B"],
                   "bid":[random() for i in range(8)],
                   "clicks": [randint(0,10) for i in range(8)],
                   "country_code":["A","A","A","A","A","B","A","B"]})

I want to fit LinearRegression for each market, so I:

1) Group df - groups = t.groupby(by="market")

2) Prepare function to fit model on a group -

from sklearn.linear_model import LinearRegression
def group_fitter(group):
    lr = LinearRegression()
    X = group["bid"].fillna(0).values.reshape(-1,1)
    y = group["clicks"].fillna(0)
    lr.fit(X, y)
    return lr.coef_[0] # THIS IS A SCALAR

3) Create a new Series with market as an index and coef as a value:

s = groups.transform(group_fitter) 

But the 3rd step fails: KeyError: ('bid_cpc', 'occurred at index bid')

like image 573
Ladenkov Vladislav Avatar asked Jan 30 '26 21:01

Ladenkov Vladislav


2 Answers

I think you need instead transform use apply because working with more columns in function together and for new column use join:

from sklearn.linear_model import LinearRegression
def group_fitter(group):
    lr = LinearRegression()
    X = group["bid"].fillna(0).values.reshape(-1,1)
    y = group["clicks"].fillna(0)
    lr.fit(X, y)
    return lr.coef_[0] # THIS IS A SCALAR

groups = t.groupby(by="market")
df = t.join(groups.apply(group_fitter).rename('new'), on='market')
print (df) 
        bid  clicks country_code market metasearch       new
0  0.462734       9            A      A          A -8.632301
1  0.438869       5            A      B          B  6.690289
2  0.047160       9            A      A          A -8.632301
3  0.644263       0            A      B          B  6.690289
4  0.579040       0            A      A          A -8.632301
5  0.820389       6            B      B          B  6.690289
6  0.112341       5            A      A          A -8.632301
7  0.432502       0            B      B          B  6.690289
like image 64
jezrael Avatar answered Feb 01 '26 11:02

jezrael


Just return the group from the function instead of the coefficient.

# return the group instead of scaler value
def group_fitter(group):
    lr = LinearRegression()
    X = group["bid"].fillna(0).values.reshape(-1,1)
    y = group["clicks"].fillna(0)
    lr.fit(X, y)
    group['coefficient'] = lr.coef_[0] # <- This is the changed line
    return group

# the new column gets added to the data 
s = groups.apply(group_fitter)
like image 24
YOLO Avatar answered Feb 01 '26 09:02

YOLO



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!