Can I use scikit-learn pipeline to transform a specific variable only?

Question

Reading scikit-learn doc on Pipeline, all the examples apply the transformers on the entire dataset (e.g. StandardScaler, PCA).

Is it possible to, say, only scale a specific variable in the dataset? If this is possible, then I can put my entire feature engineering process into a Pipeline and apply it on both my train and test sets.

Mark Whitfield · Accepted Answer

You can use a combination of FeatureUnion and custom transformers that take only the variable you're interested in.

However, you're right in that sklearn does not handle heterogeneous feature sets particularly well. There is a library sklearn-pandas which makes it a lot easier, letting you define separate pipelines for specific columns of a pandas dataframe.

Can I use scikit-learn pipeline to transform a specific variable only?

Tags:

machine-learning

scikit-learn

pipeline

Heisenberg

1 Answers

Mark Whitfield

Recent Activity

Donate For Us

Can I use scikit-learn pipeline to transform a specific variable only?

Tags:

machine-learning

scikit-learn

pipeline

Heisenberg

1 Answers

Mark Whitfield

Related questions

Recent Activity

Donate For Us