Proceedings of the 22nd Conference on Computational Natural Language Learning 2018
DOI: 10.18653/v1/k18-1036
|View full text |Cite
|
Sign up to set email alerts
|

Modeling Composite Labels for Neural Morphological Tagging

Abstract: Neural morphological tagging has been regarded as an extension to POS tagging task, treating each morphological tag as a monolithic label and ignoring its internal structure. We propose to view morphological tags as composite labels and explicitly model their internal structure in a neural sequence tagger. For this, we explore three different neural architectures and compare their performance with both CRF and simple neural multiclass baselines. We evaluate our models on 49 languages and show that the neural a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
19
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(22 citation statements)
references
References 26 publications
3
19
0
Order By: Relevance
“…Currently, there exists no morphological parser that performs both analysis and disambiguation of morphemes in Sanskrit, leaving aside the Cliq-EBM-P configuration reported in Krishna et al (2018). Instead, as baselines, we utilize two widely used neural sequence taggers that reported state-of-the-art results on multiple morphologically rich languages, the FCRF (Malaviya, Gormley, and Neubig 2018) and SeqGen (Tkachenko and Sirts 2018). But they predict only the morphological tag of each word.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Currently, there exists no morphological parser that performs both analysis and disambiguation of morphemes in Sanskrit, leaving aside the Cliq-EBM-P configuration reported in Krishna et al (2018). Instead, as baselines, we utilize two widely used neural sequence taggers that reported state-of-the-art results on multiple morphologically rich languages, the FCRF (Malaviya, Gormley, and Neubig 2018) and SeqGen (Tkachenko and Sirts 2018). But they predict only the morphological tag of each word.…”
Section: Resultsmentioning
confidence: 99%
“…Sequence Generation Model (SeqGen). The model, proposed in Tkachenko and Sirts (2018), also treats the label as a composite label. Here, a char-BiLSTM is used to obtain wordembeddings, which are then passed on to a word-level BiLSTM as the input features (Lample et al 2016;Heigold, Neumann, and van Genabith 2017).…”
Section: Morphological Parsingmentioning
confidence: 99%
“…More details can be found in Appendix Section §C. Tkachenko and Sirts (2018) also model dependence on POS with a POS-dependent context vector in the decoder. However, they observe no significant improvement; we hypothesize that incorporating POS information into the shared encoder instead provides the model with a stronger signal.…”
Section: Discussionmentioning
confidence: 99%
“…To account for the potential dependence between predicted tag dimensions, we feed the encoded representation of each word as the initial hidden states of a GRU (Gated Recurrent Unit, Cho et al, 2014) decoder, which is then trained to predict one tag dimension at each decoding timestep. The use of such a seq2seq model is also partly motivated by its state-of-the-art performance in various NLP tasks such as machine translation (Bahdanau et al, 2015;Luong et al, 2015), document classification (Nam et al, 2017;Yang et al, 2018), morphological reinflection (Kann and Schütze, 2016;Kann et al, 2017), and morphological analysis like the current shared task (Tkachenko and Sirts, 2018). Our seq2seq model resembles Tkachenko and Sirts's (2018) SEQ model, with the primary difference being the use of a GRU decoder (instead of their unidirectional LSTM) and the sorting of tag dimensions in decreasing order of frequency Our seq2seq model strongly outperforms the official baseline, scoring 14.25 and 4.6 points higher on average across 107 datasets on exact match accuracy and micro-averaged F1 scores respectively.…”
Section: Model Descriptionmentioning
confidence: 99%
“…Furthermore, to explicitly incorporate the underlying structure between MSD tag dimensions, the binary relevance model could be extended to a multiclass multilabel classifier, which selects one tag among those that are in complementary distribution for each morphological category (e.g. part-of-speech, case, number) as in Tkachenko and Sirts (2018). Finally, a more rig-orous search for the optimal hyperparameters (e.g.…”
Section: Lengths Of Tag Sequencesmentioning
confidence: 99%