NOTE: I am using Python 2.7 as part of Anaconda distribution. I hope this is not a problem for nltk 3.1.
I am trying to use nltk for NER as
import nltk
from nltk.tag.stanford import StanfordNERTagger
#st = StanfordNERTagger('stanford-ner/all.3class.distsim.crf.ser.gz', 'stanford-ner/stanford-ner.jar')
st = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz')
print st.tag(str)
but i get
Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
at edu.stanford.nlp.io.IOUtils.<clinit>(IOUtils.java:41)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyAndWriteAnswers(AbstractSequenceClassifier.java:1117)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyAndWriteAnswers(AbstractSequenceClassifier.java:1076)
at edu.stanford.nlp.ie.AbstractSequenceClassifier.classifyAndWriteAnswers(AbstractSequenceClassifier.java:1057)
at edu.stanford.nlp.ie.crf.CRFClassifier.main(CRFClassifier.java:3088)
Caused by: java.lang.ClassNotFoundException: org.slf4j.LoggerFactory
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 5 more
Traceback (most recent call last):
File "X:\jnk.py", line 47, in <module>
print st.tag(str)
File "X:\Anaconda2\lib\site-packages\nltk\tag\stanford.py", line 66, in tag
return sum(self.tag_sents([tokens]), [])
File "X:\Anaconda2\lib\site-packages\nltk\tag\stanford.py", line 89, in tag_sents
stdout=PIPE, stderr=PIPE)
File "X:\Anaconda2\lib\site-packages\nltk\internals.py", line 134, in java
raise OSError('Java command failed : ' + str(cmd))
OSError: Java command failed : ['X:\\PROGRA~1\\Java\\JDK18~1.0_6\\bin\\java.exe', '-mx1000m', '-cp', 'X:\\stanford\\stanford-ner.jar', 'edu.stanford.nlp.ie.crf.CRFClassifier', '-loadClassifier', 'X:\\stanford\\classifiers\\english.all.3class.distsim.crf.ser.gz', '-textFile', 'x:\\appdata\\local\\temp\\tmpqjsoma', '-outputFormat', 'slashTags', '-tokenizerFactory', 'edu.stanford.nlp.process.WhitespaceTokenizer', '-tokenizerOptions', '"tokenizeNLs=false"', '-encoding', 'utf8']
but i can see that the slf4j jar is there in my lib folder. do i need to update an environment variable?
Edit
Thanks everyone for their help, but i still get the same error. Here is what i tried recently
import nltk
from nltk.tag import StanfordNERTagger
print(nltk.__version__)
stanford_ner_dir = 'X:\\stanford\\'
eng_model_filename= stanford_ner_dir + 'classifiers\\english.all.3class.distsim.crf.ser.gz'
my_path_to_jar= stanford_ner_dir + 'stanford-ner.jar'
st = StanfordNERTagger(model_filename=eng_model_filename, path_to_jar=my_path_to_jar)
print st._stanford_model
print st._stanford_jar
st.tag('Rami Eid is studying at Stony Brook University in NY'.split())
and also
import nltk
from nltk.tag import StanfordNERTagger
print(nltk.__version__)
st = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz')
print st._stanford_model
print st._stanford_jar
st.tag('Rami Eid is studying at Stony Brook University in NY'.split())
i get
3.1
X:\stanford\classifiers\english.all.3class.distsim.crf.ser.gz
X:\stanford\stanford-ner.jar
after that it goes on to print the same stacktrace as before. java.lang.ClassNotFoundException: org.slf4j.LoggerFactory
any idea why this might be happening? I updated my CLASSPATH as well. I even added all the relevant folders to my PATH environment variable.for example the folder where i unzipped the stanford jars, the place where i unzipped slf4j and even the lib folder inside the stanford folder. i have no idea why this is happening :(
Could it be windows? i have had problems with windows paths before
Update
The Stanford NER version i have is 3.6.0. The zip file says stanford-ner-2015-12-09.zip
I also tried using the stanford-ner-3.6.0.jar instead of stanford-ner.jar but still get the same error
When i right click on the stanford-ner-3.6.0.jar, i notice

i see this for all the files that i have extracted, even the slf4j files.could this be causing the problem?
java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
i do not see any folder named org anywhere
Update: Env variables
Here are my env variables
CLASSPATH
.;
X:\jre1.8.0_60\lib\rt.jar;
X:\stanford\stanford-ner-3.6.0.jar;
X:\stanford\stanford-ner.jar;
X:\stanford\lib\slf4j-simple.jar;
X:\stanford\lib\slf4j-api.jar;
X:\slf4j\slf4j-1.7.13\slf4j-1.7.13\slf4j-log4j12-1.7.13.jar
STANFORD_MODELS
X:\stanford\classifiers
JAVA_HOME
X:\PROGRA~1\Java\JDK18~1.0_6
PATH
X:\PROGRA~1\Java\JDK18~1.0_6\bin;
X:\stanford;
X:\stanford\lib;
X:\slf4j\slf4j-1.7.13\slf4j-1.7.13
anything wrong here?
NOTE:
Below is a temporal hack to work with:
This solution is NOT meant to be an eternal solution.
Always refer to https://github.com/nltk/nltk/wiki/Installing-Third-Party-Software for the latest instruction on how to interface Stanford NLP tools using NLTK!!
Please track updates on this issue if you do not want to use this "hack": https://github.com/nltk/nltk/issues/1237 or please use the NER tool compield on 2015-04-20.
Make sure that you have:
CLASSPATH and STANFORD_MODELS
To set environment variables in Windows:
set CLASSPATH=%CLASSPATH%;C:\some\path\to\stanford-ner\stanford-ner.jar
set STANFORD_MODELS=%STANFORD_MODELS%;C:\some\path\to\stanford-ner\classifiers
To set environment variables in Linux:
export STANFORDTOOLSDIR=/home/some/path/to/stanfordtools/
export CLASSPATH=$STANFORDTOOLSDIR/stanford-ner-2015-12-09/stanford-ner.jar
export STANFORD_MODELS=$STANFORDTOOLSDIR/stanford-ner-2015-12-09/classifiers
Then:
>>> from nltk.internals import find_jars_within_path
>>> from nltk.tag import StanfordNERTagger
>>> st = StanfordNERTagger('english.all.3class.distsim.crf.ser.gz')
# Note this is where your stanford_jar is saved.
# We are accessing the environment variables you've
# set through the NLTK API.
>>> print st._stanford_jar
/home/alvas/stanford-ner-2015-12-09/stanford-ner.jar
>>> stanford_dir = st._stanford_jar.rpartition("\\")[0] # windows
# Note in linux you do this instead:
>>> stanford_dir = st._stanford_jar.rpartition('/')[0] # linux
# Use the `find_jars_within_path` function to get all the
# jar files out from stanford NER tool under the libs/ dir.
>>> stanford_jars = find_jars_within_path(stanford_dir)
# Put the jars back into the `stanford_jar` classpath.
>>> st._stanford_jar = ':'.join(stanford_jars) # linux
>>> st._stanford_jar = ';'.join(stanford_jars) # windows
>>> st.tag('Rami Eid is studying at Stony Brook University in NY'.split())
[(u'Rami', u'PERSON'), (u'Eid', u'PERSON'), (u'is', u'O'), (u'studying', u'O'), (u'at', u'O'), (u'Stony', u'ORGANIZATION'), (u'Brook', u'ORGANIZATION'), (u'University', u'ORGANIZATION'), (u'in', u'O'), (u'NY', u'O')]
I encountered exactly the same problem as you described yesterday.
There are 3 things you need to do.
1) Update your NLTK.
pip install -U nltk
Your version should be >3.1 and I see you are using
from nltk.tag.stanford import StanfordNERTagger
However, you gotta use the new module:
from nltk.tag import StanfordNERTagger
2) Download slf4j and update your CLASSPATH.
Here is how you update your CLASSPATH.
javapath = "/Users/aerin/Downloads/stanford-ner-2014-06-16/stanford-ner.jar:/Users/aerin/java/slf4j-1.7.13/slf4j-log4j12-1.7.13.jar"
os.environ['CLASSPATH'] = javapath
As you see above, the javapath contains 2 paths, one is where stanford-ner.jar is, the other is where you downloaded slf4j-log4j12-1.7.13.jar (It can be downloaded here: http://www.slf4j.org/download.html)
3) Don't forget to specify where you downloaded 'english.all.3class.distsim.crf.ser.gz' & 'stanford-ner.jar'
st = StanfordNERTagger('/Users/aerin/Downloads/stanford-ner-2014-06-16/classifiers/english.all.3class.distsim.crf.ser.gz','/Users/aerin/Downloads/stanford-ner-2014-06-16/stanford-ner.jar')
st.tag("Doneyo lab did such an awesome job!".split())
i fixed!
u should indicate the full path of slf4j-api.jar in CLASSPATH
instead of add jar-path into system environment variable, u can do like this in code:
_CLASS_PATH = "."
if os.environ.get('CLASSPATH') is not None:
_CLASS_PATH = os.environ.get('CLASSPATH')
os.environ['CLASSPATH'] = _CLASS_PATH + ';F:\Python\Lib\slf4j\slf4j-api-1.7.13.jar'
important, in nltk/*/stanford.py will reset the classpath like this:
stdout, stderr = java(cmd, classpath=self._stanford_jar, stdout=PIPE, stderr=PIPE)
eg. \Python34\Lib\site-packages\nltk\tokenize\stanford.py line:90
u can fix it like this:
_CLASS_PATH = "."
if os.environ.get('CLASSPATH') is not None:
_CLASS_PATH = os.environ.get('CLASSPATH')
stdout, stderr = java(cmd, classpath=(self._stanford_jar, _CLASS_PATH), stdout=PIPE, stderr=PIPE)
Current Stanford NER tagger version is not compatible with nltk because it requires additional jars that nltk cannot add to the CLASSPATH.
Instead prefer an older version of Stanford NER Tagger that will works perfectly fine like this one: http://nlp.stanford.edu/software/stanford-ner-2015-04-20.zip
For those who want to use Stanford NER >= 3.6.0 instead of the 2015-01-30 (3.5.1) or other old version, do this instead:
Put the stanford-ner.jar and slf4j-api.jar into the same folder
For example, I put the following files to /path-to-libs/
Then:
classpath = "/path-to-libs/*"
st = nltk.tag.StanfordNERTagger(
"/path-to-model/ner-model.ser.gz",
"/path-to-libs/stanford-ner-3.6.0.jar"
)
st._stanford_jar = classpath
result = st.tag(["Hello"])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With