2020
DOI: 10.1109/access.2020.3015778
|View full text |Cite
|
Sign up to set email alerts
|

Data Augmentation Methods for Low-Resource Orthographic Syllabification

Abstract: An n-gram syllabification model generally produces a high error rate for a low-resource language, such as Indonesian, because of the high rate of out-of-vocabulary (OOV) n-grams. In this paper, a combination of three methods of data augmentations is proposed to solve the problem, namely swapping consonant-graphemes, flipping onsets, and transposing nuclei. An investigation on 50k Indonesian words shows that the combination of three data augmentation methods drastically increases the amount of both unigrams and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(1 citation statement)
references
References 36 publications
0
1
0
Order By: Relevance
“…In some simple languages, such as Indonesian, several data augmentation methods can be applied to solve this problem. For instance, a model named combination of flipping-onsets with standard-trigram and augmented-bigram syllabification (CFTABS) incorporate three augmentation techniques of flipping onsets, transposing nuclei, and swapping consonant-graphemes is developed in [23] . CFTABS produces a much lower SER than the original n -gram model with no augmentation.…”
Section: Introductionmentioning
confidence: 99%
“…In some simple languages, such as Indonesian, several data augmentation methods can be applied to solve this problem. For instance, a model named combination of flipping-onsets with standard-trigram and augmented-bigram syllabification (CFTABS) incorporate three augmentation techniques of flipping onsets, transposing nuclei, and swapping consonant-graphemes is developed in [23] . CFTABS produces a much lower SER than the original n -gram model with no augmentation.…”
Section: Introductionmentioning
confidence: 99%