is there a way in the NER model in spaCy to extract the metrics (precision, recall, f1 score) per entity type?
Something that will look like this:
         precision    recall  f1-score   support
  B-LOC      0.810     0.784     0.797      1084
  I-LOC      0.690     0.637     0.662       325
 B-MISC      0.731     0.569     0.640       339
 I-MISC      0.699     0.589     0.639       557
  B-ORG      0.807     0.832     0.820      1400
  I-ORG      0.852     0.786     0.818      1104
  B-PER      0.850     0.884     0.867       735
  I-PER      0.893     0.943     0.917       634
avg / total 0.809 0.787 0.796 6178
taken from: http://www.davidsbatista.net/blog/2018/05/09/Named_Entity_Evaluation/
Thank you!
For evaluation, custom NER uses the following metrics: Precision: Measures how precise/accurate your model is. It is the ratio between the correctly identified positives (true positives) and all identified positives. The precision metric reveals how many of the predicted entities are correctly labeled.
“precision is the percentage of named entities found by the learning system that are correct. Recall is the percentage of named entities present in the corpus that are found by the system. A named entity is correct only if it is an exact match of the corresponding entity in the data file.”
As name implies, this command will evaluate a model accuracy and speed. It will be done on JSON'-formatted annotated data. Evaluate command will print the results and optionally export displaCy visualisations of a sample set of parsers to HTML files (.
From spacy v3,
#Test the model
import spacy
from spacy.training.example import Example
nlp = spacy.load("./model_saved")
examples = []
data = [("Taj mahal is in Agra.", {"entities": [(0, 9, 'name'),
(16, 20, 'place')]})]
for text, annots in data:
    doc = nlp.make_doc(text)
    examples.append(Example.from_dict(doc, annots))
print(nlp.evaluate(examples)) # This will provide overall and per entity metrics
Nice question.
First, we should clarify that spaCy uses the BILUO annotation scheme instead of the BIO annotation scheme you are referring to. From the spacy documentation the letters denote the following:
Then, some definitions:
Spacy has a built-in class to evaluate NER. It's called scorer. Scorer uses exact matching to evaluate NER. The precision score is returned as ents_p, the recall as ents_r and the F1 score as ents_f.
The only problem with that is that it returns the score for all the tags together in the document. However, we can call the function only with the TAG we want and get the desired result.
All together, the code should look like this:
import spacy
from spacy.gold import GoldParse
from spacy.scorer import Scorer
def evaluate(nlp, examples, ent='PERSON'):
    scorer = Scorer()
    for input_, annot in examples:
        text_entities = []
        for entity in annot.get('entities'):
            if ent in entity:
                text_entities.append(entity)
        doc_gold_text = nlp.make_doc(input_)
        gold = GoldParse(doc_gold_text, entities=text_entities)
        pred_value = nlp(input_)
        scorer.score(pred_value, gold)
    return scorer.scores
examples = [
    ("Trump says he's answered Mueller's Russia inquiry questions \u2013 live",{"entities":[[0,5,"PERSON"],[25,32,"PERSON"],[35,41,"GPE"]]}),
    ("Alexander Zverev reaches ATP Finals semis then reminds Lendl who is boss",{"entities":[[0,16,"PERSON"],[55,60,"PERSON"]]}),
    ("Britain's worst landlord to take nine years to pay off string of fines",{"entities":[[0,7,"GPE"]]}),
    ("Tom Watson: people's vote more likely given weakness of May's position",{"entities":[[0,10,"PERSON"],[56,59,"PERSON"]]}),
]
nlp = spacy.load('en_core_web_sm')
results = evaluate(nlp, examples)
print(results)
Call the evaluate function with the proper ent parameter to get the results for each tag.
Hope it helps :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With