I'm trying to single out a component/transformer from a fitted pipeline to inspect it's behavior. However, when I retrieved the component, the component is showed as unfitted, but using the pipeline as a whole works without problem. This suggest the pipeline is fitted and the components are fitted as well.
Can someone explain why, and also suggest how to inspect a component in a fitted pipeline?
Here's a reproducible example:
import pandas as pd
import numpy as np
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split, GridSearchCV
np.random.seed(0)
# Read data from Titanic dataset.
titanic_url = ('https://raw.githubusercontent.com/amueller/'
               'scipy-2017-sklearn/091d371/notebooks/datasets/titanic3.csv')
data = pd.read_csv(titanic_url)
# We create the preprocessing pipelines for both numeric and categorical data.
numeric_features = ['age', 'fare']
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())])
categorical_features = ['embarked', 'sex', 'pclass']
categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))])
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)])
# Append classifier to preprocessing pipeline.
# Now we have a full prediction pipeline.
clf = Pipeline(steps=[('preprocessor', preprocessor),
                      ('classifier', LogisticRegression(solver='lbfgs'))])
X = data.drop('survived', axis=1)
y = data['survived']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
clf.fit(X_train, y_train)
print("model score: %.3f" % clf.score(X_test, y_test))
Calling either:
clf.get_params()['preprocessor__cat__imputer'].transform(X)
or
clf.named_steps['preprocessor'].transformers[0][1].named_steps['imputer'].transform(X)
will result in such error:
NotFittedError: This SimpleImputer instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.
The ColumnTransformer attribute transformers is the input unfitted transformers.  To access the fitted transformers, use the attribute transformers_ or named_transformers_.  I suppose get_params()['preprocessor__cat__imputer'] is also getting the unfitted input transformer.
(You'll still get an error, because the imputer will try to work on the string data as well, and strategy='median' will fail.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With