sklearn: Using CountVectorizer object to get a feature vector of a new string

Question

So I create a CountVectorizer object by executing following lines.

count_vectorizer = CountVectorizer(binary='true')
data = count_vectorizer.fit_transform(data)

Now I have a new string and I would want to map this string to the TDM matrix that i get from CountVectorizer. So what I am expecting for a string I input to the TDM, is a corresponding document term vector.

I tried,

count_vectorizer.transform([string])

Gave an error, AttributeError: transform not found Adding a a part of the stacktrace, Its a long stacktrace and hence I am adding just the relevant bits which are the last few lines of the trace.

  File "/Users/ankit/Desktop/geny/APIServer/RUNTIME/src/controller/sentiment/Sentiment.py", line 29, in computeSentiment
    vec = self.models[model_name]["vectorizer"].transform([string])
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/scipy/sparse/base.py", line 440, in __getattr__
    raise AttributeError(attr + " not found")

Please advice.

Thanks

Ankit S

Aditya · Accepted Answer

The example you showed wasn't reproducible - what is the string variable here? However following code seems to work perfectly:-

from sklearn.feature_extraction.text import CountVectorizer

data = ["aa bb cc", "cc dd ee"]
count_vectorizer = CountVectorizer(binary='true')
data = count_vectorizer.fit_transform(data)

# Check if your vocabulary is being built perfectly
print count_vectorizer.vocabulary_

# Trying a couple new string with added new word. new word should be ignored
newData = count_vectorizer.transform(["aa dd mm", "aa bb"])
print newData

# You can get the array by writing  
print newData.toarray()

enter image description here

Well, count_vectorizer.transform() accepts list of strings - not a single string. If the transform-fitting didn't work, it should have raised "ValueError: Vocabulary wasn't fitted or is empty!" In case of errors of this kind, paste the whole traceback stack (exception stack). No one can see where AttributeError is coming from - your code or some internal bug in sklearn.

sklearn: Using CountVectorizer object to get a feature vector of a new string

Tags:

machine-learning

scikit-learn

Ankit Solanki

1 Answers

Aditya

Recent Activity

Donate For Us

sklearn: Using CountVectorizer object to get a feature vector of a new string

Tags:

machine-learning

scikit-learn

Ankit Solanki

1 Answers

Aditya

Related questions

Recent Activity

Donate For Us