How to use stanford word tokenizer in NLTK?

Question

I am searching way to use stanford word tokenizer in nltk, I want to use because when I compare results of stanford and nltk word tokenizer, they both are different. I know there might be way to use stanford tokenizer, like we can stanford POS Tagger and NER in NLTK.

Is it possible to do use stanford tokenizer without running server?

Thanks

Is it possible to do use stanford tokenizer without running server?

Thanks

alvas · Accepted Answer

Note: This solution would only work for:

NLTK v3.2.5 (v3.2.6 would have an even simpler interface)
Stanford CoreNLP (version >= 2016-10-31)

First you have to get Java 8 properly installed first and if Stanford CoreNLP works on command line, the Stanford CoreNLP API in NLTK v3.2.5 is as follows.

Note: You have to start the CoreNLP server in terminal BEFORE using the new CoreNLP API in NLTK.

On the terminal:

wget http://nlp.stanford.edu/software/stanford-corenlp-full-2016-10-31.zip
unzip stanford-corenlp-full-2016-10-31.zip && cd stanford-corenlp-full-2016-10-31

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-preload tokenize,ssplit,pos,lemma,parse,depparse \
-status_port 9000 -port 9000 -timeout 15000

In Python:

>>> from nltk.parse.corenlp import CoreNLPParser
>>> st = CoreNLPParser()
>>> tokenized_sent = list(st.tokenize('What is the airspeed of an unladen swallow ?'))
>>> tokenized_sent
['What', 'is', 'the', 'airspeed', 'of', 'an', 'unladen', 'swallow', '?']

How to use stanford word tokenizer in NLTK?

Tags:

python

tokenize

nltk

stanford-nlp

Lucky

1 Answers

alvas

Recent Activity

Donate For Us

How to use stanford word tokenizer in NLTK?

Tags:

python

tokenize

nltk

stanford-nlp

Lucky

1 Answers

alvas

Related questions

Recent Activity

Donate For Us