Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding Dropping Column instance into a Pipeline

In general, we will df.drop('column_name', axis=1) to remove a column in a DataFrame. I want to add this transformer into a Pipeline

Example:

numerical_transformer = Pipeline(steps=[('imputer', SimpleImputer(strategy='mean')),
                                     ('scaler', StandardScaler(with_mean=False))
                                     ])

How can I do it?

like image 678
Tan Phan Avatar asked Mar 02 '26 19:03

Tan Phan


1 Answers

You can write a custom Transformer like this :

class columnDropperTransformer():
    def __init__(self,columns):
        self.columns=columns

    def transform(self,X,y=None):
        return X.drop(self.columns,axis=1)

    def fit(self, X, y=None):
        return self 

And use it in a pipeline :

import pandas as pd

# sample dataframe
df = pd.DataFrame({
"col_1":["a","b","c","d"],
"col_2":["e","f","g","h"],
"col_3":[1,2,3,4],
"col_4":[5,6,7,8]
})

# your pipline
pipeline = Pipeline([
    ("columnDropper", columnDropperTransformer(['col_2','col_3']))
])

# apply the pipeline to dataframe
pipeline.fit_transform(df)

Output :

  col_1 col_4
0    a    5
1    b    6
2    c    7
3    d    8
like image 147
Meysam Amini Avatar answered Mar 05 '26 08:03

Meysam Amini



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!