Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get all noun phrases in Spacy(Python)

Tags:

python

nlp

spacy

I would like to extract "all" the noun phrases from a sentence. I'm wondering how I can do it. I have the following code:

doc2 = nlp("what is the capital of Bangladesh?")
for chunk in doc2.noun_chunks:
    print(chunk)

Output:

1. what

2. the capital

3. bangladesh

Expected:

the capital of Bangladesh

I have tried answers from spacy doc and StackOverflow. Nothing worked. It seems only cTakes and Stanford core NLP can give such complex NP.

Any help is appreciated.

like image 457
Sazzad Avatar asked Nov 23 '25 12:11

Sazzad


2 Answers

For those who are still looking for this answer

noun_pharses=set()    
for nc in doc.noun_chunks:
    for np in [nc, doc[nc.root.left_edge.i:nc.root.right_edge.i+1]]:
       noun_pharses.add(np)

This is how I get all the complex noun phrase

like image 182
Sazzad Avatar answered Nov 25 '25 04:11

Sazzad


Spacy clearly defines a noun chunk as:

A base noun phrase, or "NP chunk", is a noun phrase that does not permit other NPs to be nested within it – so no NP-level coordination, no prepositional phrases, and no relative clauses." (https://spacy.io/api/doc#noun_chunks)

If you process the dependency parse differently, allowing prepositional modifiers and nested phrases/chunks, then you can end up with what you're looking for.

I bet you could modify the existing spacy code fairly easily to do what you want:

https://github.com/explosion/spaCy/blob/06c6dc6fbcb8fbb78a61a2e42c1b782974bd43bd/spacy/lang/en/syntax_iterators.py

like image 26
aab Avatar answered Nov 25 '25 03:11

aab