I'm using Spacy large model but it's incorrectly tagging entities with categories that are not relevant to my domain, eg 'work of art' can cause it not to recognise what should have been an Org.
Is it possible to restrict NER to only return People, Locations and Organisations ?
Short answer:
No, you cannot restrict NER to not tag specific Tags or the opposite.
What you can do is limit it in code or modify the model [see long answer].
Limiting it in code is just filtering the retrieved entities, but it won't solve your problem with missclassifications.
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp(u"Apple is looking at buying U.K. startup for $1 billion")
entities = [ent for ent in doc.ents if ent.label_ == "ORG"]
Long answer:
You can restrict NER in spacy, but not with a simple parameter (currently).
Why not? Simple: NER is a supervised machine learning task. You provide text with tagged entities, it trains and then attempts to predict new instances from the parameters it learned beforehand.
If you want NER only to recognize certain entities, such as orgs, you have to train a new model only with org instances.
If you're familiar with Machine Learning concepts, you'll understand it this way: in a multi class classification task, you cannot simply remove a class without retraining the entire model with filtered train data.
Check this page for more info on NER training: https://spacy.io/usage/linguistic-features/#named-entities
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With