Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to analyse the intermediate steps of sklearn-pipeline? [duplicate]

I am using the sklearn to classify the text into categories. I am using CountVectorizer and TFIDFTransformer to create the sparse matrix.

I am performing couple of pre-processing steps on string in the customtokenize_and_stem function used in CountVectorizer tokenizer.

from sklearn.pipeline import Pipeline
from sklearn.svm import LinearSVC
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer

SVM = Pipeline([('vect', CountVectorizer(max_features=100000,\
                                         ngram_range= (1, 2),stop_words='english',tokenizer=tokenize_and_stem)),\
                         ('tfidf', TfidfTransformer(use_idf= True)),\
                         ('clf-svm', LinearSVC(C=1)),])

my question here is, if there is any easy way available to view/store the output of step 1/2 of Pipeline to analyse what kind of array is going into svm ?

like image 311
Shivam Agrawal Avatar asked Oct 26 '25 11:10

Shivam Agrawal


1 Answers

You could get the intermediate steps output with something like this.

Based on the source code:

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import LinearSVC
from sklearn.pipeline import Pipeline

pipeline = Pipeline([('vect', TfidfVectorizer(ngram_range= (1, 2),stop_words='english')),\
                     ('clf-svm', LinearSVC(C=1)),])
X= ["I want to test this document", "let us see how it works", "I am okay and you ?"]

pipeline.fit(X,[0,1,1])

print(pipeline.named_steps['vect'].get_feature_names())

['document', 'let', 'let works', 'okay', 'test', 'test document', 'want', 'want test', 'works']    

#Here is where you can get the output of intermediate steps
Xt = X

for name, transform in pipeline.steps[:-1]:
    if transform is not None:
        Xt = transform.transform(Xt)
        
print(Xt)



  (0, 7)    0.4472135954999579
  (0, 6)    0.4472135954999579
  (0, 5)    0.4472135954999579
  (0, 4)    0.4472135954999579
  (0, 0)    0.4472135954999579
  (1, 8)    0.5773502691896257
  (1, 2)    0.5773502691896257
  (1, 1)    0.5773502691896257
  (2, 3)    1.0
like image 127
Venkatachalam Avatar answered Oct 29 '25 01:10

Venkatachalam



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!