Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to apply multiple transforms to the same columns using ColumnTransformer in scikit-learn

I have a data frame that looks like this:

df = pd.DataFrame(
{
    'x' : range(0,5),
    'y' : [1,2,3,np.nan, np.nan]
})

enter image description here

I want to impute the values for y and also apply standardization to the two variables with the following code:

columnPreprocess = ColumnTransformer([
('imputer', SimpleImputer(strategy = 'median'), ['x','y']),   
('scaler', StandardScaler(), ['x','y'])])
columnPreprocess.fit_transform(df)

However, it seems like the ColumnTransformer would setup separate columns for each steps, with different transformations in different columns. This is not what I intended.

enter image description here

Is there a way to apply different transformation to the same columns and result in the same number of columns in the outputting array?

like image 779
PingPong Avatar asked Sep 05 '25 03:09

PingPong


1 Answers

You should use Pipeline in this case:

import pandas as pd
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler

df = pd.DataFrame({
    'x': range(0, 5),
    'y': [1, 2, 3, np.nan, np.nan]
})

pipeline = Pipeline([
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())
])

pipeline.fit_transform(df)
# array([[-1.41421356, -1.58113883],
#        [-0.70710678,  0.        ],
#        [ 0.        ,  1.58113883],
#        [ 0.70710678,  0.        ],
#        [ 1.41421356,  0.        ]])
like image 153
Flavia Giammarino Avatar answered Sep 07 '25 19:09

Flavia Giammarino