Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Detect Tense in German Sentence (with SpaCy)

Tags:

python

nlp

spacy

I would like to (programmatically) detect the tense (and mood) of German sentences, preferably with SpaCy. I am able to find the root in the sentence and to determine whether it is a finite verb or not. However, Searching SpaCy's documentation I didn't find a solution to determine the tense. Is this possible with SpaCy, or do I need to create my own solution for this?

If it is possible with SpaCy, how?

If not, what would be a good approach to do this? My first approach would be to discriminate between Perfekt and Plusquamperfekt tense based on the existence of a participle verb form, and to identify Futur by checking if the root is a form of werden and the existence of a dependent infinite verb form, with some extra logic to check for Futur II, analogue to checking for Plusquamperfekt. For discrimination of Präteritum against Präsens I would think of doing a look-up in a verb table. Is that a good idea, or is there a better approach, maybe a prebuilt tool?

I have found this paper: Annotating tense, mood and voice for English, French and German, but they are not overly explicit how they do it; at least I am unable to reproduce their work.

like image 847
jonathan.scholbach Avatar asked Sep 20 '25 02:09

jonathan.scholbach


1 Answers

SpaCy MorphAnalysis/Morphologizer gives you the result you want I guess. Just figured it out myself.

import spacy
nlp = spacy.load("de_core_news_lg")    
sent = "Ich flog nach Rom."
doc = nlp(sent)
for token in doc:
    print(token.text,list(token.morph), token.lemma_)

This might not be perfect because it returns a list like this:

Ich ['Case=Nom', 'Number=Sing', 'Person=1', 'PronType=Prs'] Ich
flog ['Mood=Ind', 'Number=Sing', 'Person=1', 'Tense=Past', 'VerbForm=Fin'] fliegen
nach [] nach
Rom ['Case=Dat', 'Gender=Neut', 'Number=Sing'] Rom
. [] .

But Ithink from here it is not too difficult to get a better representation like a dict or something.

Otherwise I would suggest to use the spacy function to_json().

See here:

nlp(sent1)

doc.to_json()

Which returns:

{'text': 'Ich flog nach Rom.',
 'ents': [{'start': 14, 'end': 17, 'label': 'LOC'}],
 'sents': [{'start': 0, 'end': 18}],
 'tokens': [{'id': 0,
   'start': 0,
   'end': 3,
   'tag': 'PPER',
   'pos': 'PRON',
   'morph': 'Case=Nom|Number=Sing|Person=1|PronType=Prs',
   'lemma': 'Ich',
   'dep': 'sb',
   'head': 1},
  {'id': 1,
   'start': 4,
   'end': 8,
   'tag': 'VVFIN',
   'pos': 'VERB',
   'morph': 'Mood=Ind|Number=Sing|Person=1|Tense=Past|VerbForm=Fin',
   'lemma': 'fliegen',
   'dep': 'ROOT',
   'head': 1},
  {'id': 2,
   'start': 9,
   'end': 13,
   'tag': 'APPR',
   'pos': 'ADP',
   'morph': '',
   'lemma': 'nach',
   'dep': 'mo',
   'head': 1},
  {'id': 3,
   'start': 14,
   'end': 17,
   'tag': 'NE',
   'pos': 'PROPN',
   'morph': 'Case=Dat|Gender=Neut|Number=Sing',
   'lemma': 'Rom',
   'dep': 'nk',
   'head': 2},
  {'id': 4,
   'start': 17,
   'end': 18,
   'tag': '$.',
   'pos': 'PUNCT',
   'morph': '',
   'lemma': '.',
   'dep': 'punct',
   'head': 1}]}

Let me know if this is what you were searching for. :)

like image 69
yannickhau Avatar answered Sep 21 '25 16:09

yannickhau