I'm using NLTK's RegexpParser to chunk a noun phrase, which I define with a grammar as
grammar = "NP: {<DT>?<JJ>*<NN|NNS>+}"
cp = RegexpParser(grammar)
This is grand, it is matching a noun phrase as:
Now, what if I want to match the same but having the whatever number for JJ transformed into only one? So I want to match DT if it exists, one JJ and 1+ NN/NNS. If there are more than one JJ, I want to match only one of them, the one nearest to the noun (and DT if there is, and NN/NNS).
The grammar
grammar = "NP: {<DT>?<JJ><NN|NNS>+}"
would match only when there is just one JJ, the grammar
grammar = "NP: {<DT>?<JJ>{1}<NN|NNS>+}"
which I thought would work given the typical Regexp patterns, raises a ValueError.
For example, in "This beautiful green skirt", I'd like to chunk "This green skirt".
So, how would I proceed?
Grammer grammar = "NP: {<DT>?<JJ><NN|NNS>+}" is correct for your mentioned requirement.
The example which you gave in comment section, where you are not getting DT in output -
"This beautiful green skirt is for you."
Tree('S', [('This', 'DT'), ('beautiful', 'JJ'), Tree('NP', [('green','JJ'),
('skirt', 'NN')]), ('is', 'VBZ'), ('for', 'IN'), ('you', 'PRP'), ('.', '.')])
Here in your example, there are 2 consecutive JJs which does not meet your requirements as you said - "I want to match DT if it exists, one JJ and 1+ NN/NNS."
For updated requirement -
I want to match DT if it exists, one JJ and 1+ NN/NNS. If there are more than one JJ, I want to match only one of them, the one nearest to the noun (and DT if there is, and NN/NNS).
Here, you will need to use
grammar = "NP: {<DT>?<JJ>*<NN|NNS>+}"
and do post processing of the NP chunks to remove extra JJ.
Code:
from nltk import Tree
chunk_output = Tree('S', [Tree('NP', [('This', 'DT'), ('beautiful', 'JJ'), ('green','JJ'), ('skirt', 'NN')]), ('is', 'VBZ'), ('for', 'IN'), ('you', 'PRP'), ('.', '.')])
for child in chunk_output:
if isinstance(child, Tree):
if child.label() == 'NP':
for num in range(len(child)):
if not (child[num][1]=='JJ' and child[num+1][1]=='JJ'):
print child[num][0]
Output:
This
green
skirt
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With