I want to group my DataFrame by specific column and then apply a sklearn preprocessing MinMaxScaler and store the scaler object.
My at the moment starting point:
import pandas as pd
from sklearn import preprocessing
scaler = {}
groups = df.groupby('ID')
for name, group in groups:
scr = preprocessing.MinMaxScaler()
scr.fit(group)
scaler.update({name: scr})
group = scr.transform(group)
Is this possible with df.groupby('ID').transform ?
UPDATE
From my original DataFrame
pd.DataFrame( dict( ID=list('AAABBB'),
VL=(0,10,10,100,100,200))
I want to scale all columns based on ID. In this example:
A 0.0
A 1.0
A 1.0
B 0.0
B 0.0
B 1.0
with the information / scaler object (initialized with fit)
preprocessing.MinMaxScaler().fit( ... )
The “group by” process: split-apply-combine (1) Splitting the data into groups. (2). Applying a function to each group independently, (3) Combining the results into a data structure.
Step 1: split the data into groups by creating a groupby object from the original DataFrame; Step 2: apply a function, in this case, an aggregation function that computes a summary statistic (you can also transform or filter your data in this step); Step 3: combine the results into a new DataFrame.
Groupby preserves the order of rows within each group.
you can do it in one direction:
In [62]: from sklearn.preprocessing import minmax_scale
In [63]: df
Out[63]:
ID VL SC
0 A 0 0
1 A 10 1
2 A 10 1
3 B 100 0
4 B 100 0
5 B 200 1
In [64]: df['SC'] = df.groupby('ID').VL.transform(lambda x: minmax_scale(x.astype(float)))
In [65]: df
Out[65]:
ID VL SC
0 A 0 0
1 A 10 1
2 A 10 1
3 B 100 0
4 B 100 0
5 B 200 1
but you will not be anle to use inverse_transform as each call of MinMaxScaler (for each group or each ID) will overwrite the information about your orginal features...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With