Proceedings of the 2019 Conference of the North 2019
DOI: 10.18653/v1/n19-1155
|View full text |Cite
|
Sign up to set email alerts
|

A Simple Joint Model for Improved Contextual Neural Lemmatization

Abstract: English verbs have multiple forms. For instance, talk may also appear as talks, talked or talking, depending on the context. The NLP task of lemmatization seeks to map these diverse forms back to a canonical one, known as the lemma. We present a simple joint neural model for lemmatization and morphological tagging that achieves state-of-the-art results on 20 languages from the Universal Dependencies corpora. Our paper describes the model in addition to training and decoding procedures. Error analysis indicates… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
22
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 17 publications
(22 citation statements)
references
References 24 publications
(34 reference statements)
0
22
0
Order By: Relevance
“…A summary of the average results of each model configuration with a comparison to the baseline(Malaviya et al, 2019).…”
mentioning
confidence: 99%
“…A summary of the average results of each model configuration with a comparison to the baseline(Malaviya et al, 2019).…”
mentioning
confidence: 99%
“…Neural (Malaviya et al, 2019): This is a stateof-the-art neural model that also performs joint morphological tagging and lemmatization, but also accounts for the exposure bias with the application of maximum likelihood (MLE). The model stitches the tagger and lemmatizer together with the use of jackknifing (Agić and Schluter, 2017) to expose the lemmatizer to the errors made by the tagger model during training.…”
Section: Task 2 Baselinesmentioning
confidence: 99%
“…We use the neural model from Malaviya et al (2019) for contextual lemmatization. This is a neural sequence-to-sequence model with hard attention, which takes both the inflected form and morphological tag set for a token as input and produces a lemma, both at the character level.…”
Section: Contextual Lemmatizationmentioning
confidence: 99%
“…The decoder uses the concatenation of the previous character and the tag set to produce the next character in the lemma. The lemmatization model is jointly trained with an LSTM-based tagger using jackknifing to reduce exposure bias in training: Malaviya et al (2019) report significantly lower lemmatization results training with gold tags and using predicted tags only at test time. We use their tagger for training and our contextual morphological analysis models' predicted tags at evaluation time.…”
Section: Contextual Lemmatizationmentioning
confidence: 99%
See 1 more Smart Citation