Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages 2022
DOI: 10.18653/v1/2022.computel-1.5
|View full text |Cite
|
Sign up to set email alerts
|

One Wug, Two Wug+s Transformer Inflection Models Hallucinate Affixes

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

1
1
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 0 publications
1
1
0
Order By: Relevance
“…Our experiment with restricting the hallucination process to generate forms that are phonotactically attested (bigram) in the training data revealed that its benefit was found only in very restricted conditions depending on the amount of hallucinated samples and the specific language (and presumably the inflectional pattern). Our findings are in agreement with the detailed error analyses of data hallucination techniques by Samir and Silfverberg (2022) which concluded that hallucination is not a one-size-fits-all technique and it must be used with caution and requires closer inspection depending on the type of morphological inflections.…”
Section: Discussionsupporting
confidence: 91%
“…Our experiment with restricting the hallucination process to generate forms that are phonotactically attested (bigram) in the training data revealed that its benefit was found only in very restricted conditions depending on the amount of hallucinated samples and the specific language (and presumably the inflectional pattern). Our findings are in agreement with the detailed error analyses of data hallucination techniques by Samir and Silfverberg (2022) which concluded that hallucination is not a one-size-fits-all technique and it must be used with caution and requires closer inspection depending on the type of morphological inflections.…”
Section: Discussionsupporting
confidence: 91%
“…The data hallucination method introduced by Anastasopoulos and Neubig (2019) can sometimes create invalid examples due to phonological alternations as noted by Samir and Silfverberg (2022). For example, given the English inflection example like+VERB+PAST → liked, their approach will first identify the longest common subsequence of the lemma and word form, that is, like and will then replace this with a random character sequence, for example xyz.…”
Section: Lemma Copyingmentioning
confidence: 99%
“…Nevertheless, there is reason for optimism. Several works have shown that automatic inflection models come much closer to a compositional solution when the human-annotated dataset is complimented by a synthetic data-augmentation procedure (Liu and Hulden, 2022;Silfverberg et al, 2017;Anastasopoulos and Neubig, 2019;Lane and Bird, 2020;Samir and Silfverberg, 2022), where morphological affixes are identified and attached to synthetic lexemes distinct from those in the training dataset (Fig. 2).…”
Section: Introductionmentioning
confidence: 99%