2017 International Conference on Asian Language Processing (IALP) 2017
DOI: 10.1109/ialp.2017.8300596
|View full text |Cite
|
Sign up to set email alerts
|

On the use of machine translation-based approaches for vietnamese diacritic restoration

Abstract: This paper presents an empirical study of two machine translation-based approaches for Vietnamese diacritic restoration problem, including phrase-based and neural-based machine translation models. This is the first work that applies neural-based machine translation method to this problem and gives a thorough comparison to the phrase-based machine translation method which is the current state-of-the-art method for this problem. On a large dataset, the phrase-based approach has an accuracy of 97.32% while that o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
11
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(11 citation statements)
references
References 11 publications
0
11
0
Order By: Relevance
“…While these deep models achieve state-of-the-art performance, they mainly rely on the use of recurrent architectures such as BiLSTM, which are relatively inefficient. Pham et al (2017) view the task of diacritization for Vietnamese as a machine transduction problem from undiacritized to diacritized text at the word level. Orife (2018) addresses the problem on Yoruba in a similar way and compares softand self-attention sequence-to-sequence performance on the word level empirically showing that self-attention significantly outperforms BiLSTM.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…While these deep models achieve state-of-the-art performance, they mainly rely on the use of recurrent architectures such as BiLSTM, which are relatively inefficient. Pham et al (2017) view the task of diacritization for Vietnamese as a machine transduction problem from undiacritized to diacritized text at the word level. Orife (2018) addresses the problem on Yoruba in a similar way and compares softand self-attention sequence-to-sequence performance on the word level empirically showing that self-attention significantly outperforms BiLSTM.…”
Section: Related Workmentioning
confidence: 99%
“…Feature engineering and classical machine learning algorithms such as Hidden Markov Models, Maximum Entropy Models, and Finite State Transducer were the dominant approaches (Nelken and Shieber, 2005;Zitouni et al, 2006;Elshafei et al, 2006). However, recent studies show significant improvement using deep neural networks (Belinkov and Glass, 2015;Pham et al, 2017;Orife, 2018). While these deep models achieve state-of-the-art performance, they mainly rely on the use of recurrent architectures such as BiLSTM, which are relatively inefficient.…”
Section: Related Workmentioning
confidence: 99%
“…In SE researches, the problem of Type Inference using MT shows that SMT model provided by [37] has a significant higher accuracy compared to the original NMT approach in [18]. Similarly, for natural language diacritic restoration, [35] shows that SMT outperforms NMT. [18,30,37] have the same characteristics of parallel corpus, i.e., the length of source and target pairs are equal and the order of the source and target words are consistent with each other.…”
Section: Introductionmentioning
confidence: 99%
“…2 PREFIX RESOLUTION [18,37] provide a translation approach considered the source side language as partial class name (PCN) and the target language as Fully Qualified Name (FQN) of APIs. [35] treated the source language as a word without diacritic information and the target language as a word with diacritic information. In other words, both of these research works build a parallel corpus with the same length of source sequence and target sequence to each pair.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation