I am getting an error when trying to use spacy matcher:
~\Anaconda3\lib\site-packages\spacy\matcher\matcher.pyx in spacy.matcher.matcher.Matcher.add()
TypeError: add() takes exactly 2 positional arguments (3 given)
Is there any alternate function for spacy.matcher.matcher.Matcher.add()?
The Matcher lets you find words and phrases using rules describing their token attributes. Rules can refer to token annotations (like the text or part-of-speech tags), as well as lexical attributes like Token. is_punct . Applying the matcher to a Doc gives you access to the matched tokens in context.
spaCy features a rule-matching engine, the Matcher , that operates over tokens, similar to regular expressions. The rules can refer to token annotations (e.g. the token text or tag_ , and flags like IS_PUNCT ).
Unlike regular expression's fixed pattern matching, this helps us match token, phrases and entities of words and sentences according to some pre-set patterns along with the features such as parts-of-speech, entity types, dependency parsing, lemmatization and many more.
See the SpaCy Matcher.add() documentation:
Changed in v3.0
As of spaCy v3.0,Matcher.addtakes a list of patterns as the second argument (instead of a variable number of arguments). Theon_matchcallback becomes an optional keyword argument.
patterns = [[{"TEXT": "Google"}, {"TEXT": "Now"}], [{"TEXT": "GoogleNow"}]]- matcher.add("GoogleNow", on_match, *patterns)+ matcher.add("GoogleNow", patterns, on_match=on_match)
Example usage:
from spacy.matcher import Matcher
matcher = Matcher(nlp.vocab)
pattern = [{"LOWER": "hello"}, {"LOWER": "world"}]
matcher.add("HelloWorld", [pattern])
doc = nlp("hello world!")
matches = matcher(doc)
Instead of using matcher.add('Relation_name', None, pattern)
You can use: matcher.add('Relation_name', [pattern], on_match=None)
In addition, if you have multiple patterns to be extracted, an example would be as below.
import spacy
nlp = spacy.load('en_core_web_sm')
from spacy.matcher import Matcher
matcher = Matcher(nlp.vocab)
pattern1 = [{'LOWER':'solarpower'}]
pattern2 = [{'LOWER':'solar'},{'IS_PUNCT':True},{'LOWER':'power'}]
pattern3 = [{'LOWER':'solar'},{'LOWER':'power'}]
matcher.add('SolarPower', [pattern1,pattern2,pattern3])
doc = nlp(u"The Solar Power industry continues to grow a solarpower increases. Solar-power is good")
found_matches = matcher(doc)
for _,start,end in found_matches:
span = doc[start:end]
print(span)
Solar Power
solarpower
Solar-power
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With