How to analyse the intermediate steps of sklearn-pipeline? [duplicate]

Question

I am using the sklearn to classify the text into categories. I am using CountVectorizer and TFIDFTransformer to create the sparse matrix.

I am performing couple of pre-processing steps on string in the customtokenize_and_stem function used in CountVectorizer tokenizer.

from sklearn.pipeline import Pipeline
from sklearn.svm import LinearSVC
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer

SVM = Pipeline([('vect', CountVectorizer(max_features=100000,\
                                         ngram_range= (1, 2),stop_words='english',tokenizer=tokenize_and_stem)),\
                         ('tfidf', TfidfTransformer(use_idf= True)),\
                         ('clf-svm', LinearSVC(C=1)),])

my question here is, if there is any easy way available to view/store the output of step 1/2 of Pipeline to analyse what kind of array is going into svm ?

Venkatachalam · Accepted Answer

You could get the intermediate steps output with something like this.

Based on the source code:

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC
from sklearn.pipeline import Pipeline

pipeline = Pipeline([('vect', TfidfVectorizer(ngram_range= (1, 2),stop_words='english')),\
                     ('clf-svm', LinearSVC(C=1)),])
X= ["I want to test this document", "let us see how it works", "I am okay and you ?"]

pipeline.fit(X,[0,1,1])

print(pipeline.named_steps['vect'].get_feature_names())

['document', 'let', 'let works', 'okay', 'test', 'test document', 'want', 'want test', 'works']    

#Here is where you can get the output of intermediate steps
Xt = X

for name, transform in pipeline.steps[:-1]:
    if transform is not None:
        Xt = transform.transform(Xt)
        
print(Xt)



  (0, 7)    0.4472135954999579
  (0, 6)    0.4472135954999579
  (0, 5)    0.4472135954999579
  (0, 4)    0.4472135954999579
  (0, 0)    0.4472135954999579
  (1, 8)    0.5773502691896257
  (1, 2)    0.5773502691896257
  (1, 1)    0.5773502691896257
  (2, 3)    1.0

How to analyse the intermediate steps of sklearn-pipeline? [duplicate]

Tags:

python

python-3.x

machine-learning

scikit-learn

Shivam Agrawal

1 Answers

Venkatachalam

Recent Activity

Donate For Us

How to analyse the intermediate steps of sklearn-pipeline? [duplicate]

Tags:

python

python-3.x

machine-learning

scikit-learn

Shivam Agrawal

1 Answers

Venkatachalam

Related questions

Recent Activity

Donate For Us