2019
DOI: 10.1145/3342354
|View full text |Cite
|
Sign up to set email alerts
|

NeuMorph

Abstract: This article deals with morphological tagging for low-resource languages. For this purpose, five Indic languages are taken as reference. In addition, two severely resource-poor languages, Coptic and Kurmanji, are also considered. The task entails prediction of the morphological tag (case, degree, gender, etc.) of an in-context word. We hypothesize that to predict the tag of a word, considering its longer context such as the entire sentence is not always necessary. In this light, the usefulness of convolution o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 15 publications
0
2
0
Order By: Relevance
“…On the other, the scarcity of training data is not only problematic per se, but also because of its impact on the rest of trials. So, generating high-quality vector representations remains a challenge [26] and the imbalance in the training samples that start the long tail and bias phenomena is more likely, together with the proneness to overfitting of DL models [27], which can result in poor predictive power, thereby compromising both inference and decision making.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…On the other, the scarcity of training data is not only problematic per se, but also because of its impact on the rest of trials. So, generating high-quality vector representations remains a challenge [26] and the imbalance in the training samples that start the long tail and bias phenomena is more likely, together with the proneness to overfitting of DL models [27], which can result in poor predictive power, thereby compromising both inference and decision making.…”
Section: Introductionmentioning
confidence: 99%
“…These encompass a class of NLP problems that involve the assignment of a categorical label to each member of a sequence of observed values, and whose output facilitates downstream applications, such as parsing or semantic analysis, so errors at this stage can lower their performance [31]. Among the most important, we can highlight named entity recognition [7,32], multi-word expression identification [29], and morphological [26] and POS tagging [2,[33][34][35][36]. It is precisely in this framework, the generation of POS taggers for low-resource scenarios by means of non-deep ML, that we propose the study of model selection based on the early estimation of learning curves.…”
Section: Introductionmentioning
confidence: 99%