Creating parse trees in NLTK using tagged sentence

Question

Using the methods defined in the NLTK book, I want to create a parse tree of a sentence that has already been POS tagged. From what I understand from the chapter linked above, any words you want to be able to recognize need to be in the grammar. This seems ridiculous, seeing as there's a built in POS tagger that would make hand-writing the parts of speech for each word completely redundant. Am I missing some functionality of the parsing methods that allows for this?

Nathan McCoy · Accepted Answer

With the stanford parser, POS tags are not needed to get a parse for a tree as it is built into the model. The StanfordParser and models are not available out of the box and need to be downloaded.

Most people see this error when trying to use the StanfordParser in NLTK

>>> from nltk.parse import stanford
>>> sp = stanford.StanfordParser()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/anaconda3/lib/python3.5/site-packages/nltk/parse/stanford.py", line 51, in __init__
    key=lambda model_name: re.match(self._JAR, model_name)
  File "/home/user/anaconda3/lib/python3.5/site-packages/nltk/internals.py", line 714, in find_jar_iter
    raise LookupError('

%s
%s
%s' % (div, msg, div))
LookupError: 

===========================================================================
  NLTK was unable to find stanford-parser\.jar! Set the CLASSPATH
  environment variable.

  For more information, on stanford-parser\.jar, see:
    <http://nlp.stanford.edu/software/lex-parser.shtml>
===========================================================================

To fix this, Download the Stanford Parser to a directory and extract the contents. Let's use the example directory on a *nix system /usr/local/lib/stanfordparser. The file stanford-parser.jar must be located there, along with the other files.

When all the files are there, set the environment variables for the location of the parser and models.

>>> import os
>>> os.environ['STANFORD_PARSER'] = '/usr/local/lib/stanfordparser'
>>> os.environ['STANFORD_MODELS'] = '/usr/local/lib/stanfordparser'

Now you can use the parser to export the possible parses for the sentence you have, for example:

>>> sp = stanford.StanfordParser()
>>> sp.parse("this is a sentence".split())
<list_iterator object at 0x7f53b93a2dd8>
>>> trees = [tree for tree in sp.parse("this is a sentence".split())]
>>> trees[0] # example parsed sentence
Tree('ROOT', [Tree('S', [Tree('NP', [Tree('DT', ['this'])]), Tree('VP', [Tree('VBZ', ['is']), Tree('NP', [Tree('DT', ['a']), Tree('NN', ['sentence'])])])])])

An iterator object is returned since there can be more than one parser for a given sentence.

Creating parse trees in NLTK using tagged sentence

Tags:

python-3.x

nltk

parse-tree

bendl

1 Answers

Nathan McCoy

Recent Activity

Donate For Us

Creating parse trees in NLTK using tagged sentence

Tags:

python-3.x

nltk

parse-tree

bendl

1 Answers

Nathan McCoy

Related questions

Recent Activity

Donate For Us